Find the position of the anomalies in the log file

saisimo02 commented 5 years ago

Thank you first for this toolkit, I have a question about the position of the anomalies in the starting log file. Is there a solution to detect and display them after using PCA or another algorithm ? Thank you in advance.

zhujiem commented 5 years ago

Thanks for your interest! The feature you described is very useful, and it is doable. Running the PCA can output the anomaly sample (log sequence), but it is hard to point out the point where it is. We will consider it to implement this feature.

zhujiem commented 5 years ago

We will also welcome your pull request for this feature if you would like to.

saisimo02 commented 5 years ago

Thanks for your response. If I understand correctly, PCA only gives the line that has an anomaly, which you call "log sequence". This line represents a time window, so for the moment we can only say that we had an anomaly at a given period by looking for the log sequence index ?

saisimo02 commented 5 years ago

Otherwise, if I find a solution I will send you my proposal ^^

ketulsheth2 commented 5 years ago

First off, thanks for providing this library. I am also looking for a similar feature, in which I can figure out the location of the anomaly in my test data. Has there been any proposals on this or any idea which I can leverage. Another question is what modifications would be needed to this library if I want to bring in my own set of logs which are parsed using the logparser library. All demos provided only deal with HDFS logs and the data preprocessing also only handles HDFS logs. So can this library deal with processing other types of logs, like Linux logs and get similar accuracy?

zhujiem commented 5 years ago

@saisimo02 Yes. You are correct. PCA can only find anomalies for log sequences.

zhujiem commented 5 years ago

@ketulsheth2 The data dependent part is the loadloader and optinally featureextractor. If you would like to try on linux logs, please follow PCA_demo_without_label.py to rewrite load_HDFS() to load_Linux()

AIMLTest commented 5 years ago

Hi, As per my understanding PCA_demo_without_label.py or other unsupervised algorithms are predicting anomaly based on block or instance grouping. suppose if we have 10000 log lines in original file, once we group with any identifier it will reduces to 140 instances example and it will predict only which instance could be anomaly. And as discussed @saisimo02, i hope its possible to find the actual line of anomaly in original file and also definitely we can predict anomaly before reaching actual error status in openstack logs(If patterns are in datasets)?.

Please correct me if my understanding is different and may i know the tentative timeline to implement this feature.

AIMLTest commented 5 years ago

Hi, one question about Logparser output and loglizer, we are feeding openstack unstrcutured logs and it's giving an output of structured.csv and templates.csv but the question is here it is converting all dynamically strings into <*>(might be you are using regular expressions.) But in few scenarios, might be we need to find VM swanup duration for finding anomaly and in that case current event producing is *It took <> seconds*. So, can we customise the logparser, for interested dynamic statements, instead of <> to actual value or Loglizer can handle automatically even this scenarios.

Let me know if I need to raise this issue an separate thread.

zhujiem commented 5 years ago

Thanks for your comments. The logparser can output the parameter list. I will update this.

zhujiem commented 5 years ago

Hi, As per my understanding PCA_demo_without_label.py or other unsupervised algorithms are predicting anomaly based on block or instance grouping. suppose if we have 10000 log lines in original file, once we group with any identifier it will reduces to 140 instances example and it will predict only which instance could be anomaly. And as discussed @saisimo02, i hope its possible to find the actual line of anomaly in original file and also definitely we can predict anomaly before reaching actual error status in openstack logs(If patterns are in datasets)?.

Please correct me if my understanding is different and may i know the tentative timeline to implement this feature.

Our current model focuses more on sequence-based anomaly detection, which is important in problem diagnosis. Predictive anomaly detection is a desirable feature. As far as we know, only the DeepLog can support this feature.

Rufaida94 commented 5 years ago

can you please explain the output of the invariant mining code? for example, in your demo, what does (0, 2): [1.0, -1.0] mean exactly?

ShilinHe commented 5 years ago

It means that the log event 0 and log event 2 should satisfy the 1-1 (i.e., [1.0, -1.0]) mapping relation. Every time event 0 occurs once, event 2 also occurs once.

AIMLTest commented 5 years ago

Hi, As per my understanding PCA_demo_without_label.py or other unsupervised algorithms are predicting anomaly based on block or instance grouping. suppose if we have 10000 log lines in original file, once we group with any identifier it will reduces to 140 instances example and it will predict only which instance could be anomaly. And as discussed @saisimo02, i hope its possible to find the actual line of anomaly in original file and also definitely we can predict anomaly before reaching actual error status in openstack logs(If patterns are in datasets)?. Please correct me if my understanding is different and may i know the tentative timeline to implement this feature.

Our current model focuses more on sequence-based anomaly detection, which is important in problem diagnosis. Predictive anomaly detection is a desirable feature. As far as we know, only the DeepLog can support this feature.

Hi @jimzhu, Couple of questions, Any tentative time line for Deeplog feature?. And as you mentioned current model mainly focuses on sequence-based anomaly detection but is it possible to back trace to the original file (before transforming into the sequence), where actual anomaly encountered. In current model it might be difficult to analyse what is an anomaly and where it's residing.

And also I think you are interested only EventTemplate feature for finding anomaly detection and ingored other features like Component, Time etc. May, I know how you have been decided EventTemplate is enough?.

As mentioned above, currently focusing on sequence model and in the documentation we can extend to sliding windows that means it's timeseries with LSTM?.

Please find the data for your reference.

LineId,Logrecord,Date,Time,Pid,Level,Component,ADDR,Content,EventId,EventTemplate 1,nova-api.log.1.2017-05-16_13:53:08,2017-05-16,00:00:00.008,25746,INFO,nova.osapi_compute.wsgi.server,req-38101a0b-2096-447d-96ea-a692162415ae 113d3a99c3da401fbd62cc2caa5b96d2 54fadb412c4e40cdbaed9335e4c35a9e - - -,"10.11.10.1 ""GET /v2/54fadb412c4e40cdbaed9335e4c35a9e/servers/detail HTTP/1.1"" status: 200 len: 1893 time: 0.2477829",E25,"<> ""GET <>"" status: <> len: <> time: <>.<>" 2,nova-api.log.1.2017-05-16_13:53:08,2017-05-16,00:00:00.272,25746,INFO,nova.osapi_compute.wsgi.server,req-9bc36dd9-91c5-4314-898a-47625eb93b09 113d3a99c3da401fbd62cc2caa5b96d2 54fadb412c4e40cdbaed9335e4c35a9e - - -,"10.11.10.1 ""GET /v2/54fadb412c4e40cdbaed9335e4c35a9e/servers/detail HTTP/1.1"" status: 200 len: 1893 time: 0.2577181",E25,"<> ""GET <>"" status: <> len: <> time: <>.<>" 3,nova-api.log.1.2017-05-16_13:53:08,2017-05-16,00:00:01.551,25746,INFO,nova.osapi_compute.wsgi.server,req-55db2d8d-cdb7-4b4b-993b-429be84c0c3e 113d3a99c3da401fbd62cc2caa5b96d2 54fadb412c4e40cdbaed9335e4c35a9e - - -,"10.11.10.1 ""GET /v2/54fadb412c4e40cdbaed9335e4c35a9e/servers/detail HTTP/1.1"" status: 200 len: 1893 time: 0.2731631",E25,"<> ""GET <>"" status: <> len: <> time: <>.<>"

AIMLTest commented 5 years ago

Thanks for your comments. The logparser can output the parameter list. I will update this.

Thank you @jimzhu for extending the parameter list. And today when I have executed faced couple of issues with Spell.py like xrange and len(seq) and hopefully this issues are due to python version and i have resolved by using instead of xrange to range and len(set(seq), after that errors are resolved and produced the o/p but event template is generated blank and parameter list also blank. Can you correct me, if anything am missing.

And also i have tried with IPLOM.py and it produced the new o/p with parameter list but the issue here is it didn't produced parameter list for all the events. Please find the details below.

Produced parameter list correctly LineId,Logrecord,Date,Time,Pid,Level,Component,ADDR,Content,EventId,EventTemplate,ParameterList 1,nova-api.log.2017-05-14_21:27:04,2017-05-14,19:39:01.445,25746,INFO,nova.osapi_compute.wsgi.server,[req-5a2050e7-b381-4ae9-92d2-8b08e9f9f4c0 113d3a99c3da401fbd62cc2caa5b96d2 54fadb412c4e40cdbaed9335e4c35a9e - - -," 10.11.10.1 ""GET /v2/54fadb412c4e40cdbaed9335e4c35a9e/servers/detail HTTP/1.1"" status: 200 len: 1583 time: 0.1919448",dd73db9f,"<> <> <> HTTP/1.1"" status <> len <> time <>","['', '10.11.10.1 ""GET /v2/54fadb412c4e40cdbaed9335e4c35a9e/servers/detail', '200', '1583', '0.1919448']"
This scenario contains time value pattern and didn't capture parameter list.

45,nova-compute.log.2017-05-14_21:27:09,2017-05-14,19:39:22.590,2931,INFO,nova.compute.manager,[req-e285b551-587f-4c1d-8eba-dceb2673637f 113d3a99c3da401fbd62cc2caa5b96d2 54fadb412c4e40cdbaed9335e4c35a9e - - -, [instance: 3edec1e4-9678-4a3a-a21b-a145a4ee5e61] Took 20.58 seconds to spawn the instance on the hypervisor.,2dcd2da2,[instance <> Took <> seconds to <*> the instance on the hypervisor.,[]

Please let me know why it didn't capture 20.58 value and other values in parameter list.

And one basic question, are you going to utilise parameter list in current models or you are going to use only in DeepLog for parameter list

zhujiem commented 5 years ago

We are working towards identifying anoamly position, but it will take some time.

zhujiem commented 5 years ago

@AIMLTest Thanks for reporting the logparser issue. We mainly support Python 2.7 in the logparser project. But some of them are python 3 compatiable. I have just fixed the issue of Spell. Besider, logparser is ML-based, which depends on a number hyper-parameters to tune. That is why you got only a part of the parameters. We recommend you to use Drain, which is the most accurate parser so far. For more accuate parsing, you can use the regex parameter to add some customized regular expressions for parameter identification. For example, r'(/|)([0-9]+\.){3}[0-9]+(:[0-9]+|)(:|)' will identify IP address.

AIMLTest commented 5 years ago

@jimzhu Thanks for modifications. Now, am able to run Spell_demo and Drain_demo program. I have used same regular expression for both programs and also same file. But quick question, both produces different event templates. is it expected? And how do you chose which logparser is need to use in openstack environment. For example, i have added two sample outputs.

Drain o/p: 9,nova-compute.log.1.2017-05-16_13:55:31,2017-05-16,00:00:04.693,2931,INFO,nova.compute.manager,[req-3ea4052c-895d-4b64-9e2d-04d64c4d94ab - - - - -,[instance: b9000564-fe1a-409b-b8cc-1e88b294cd1d] During sync_power_state the instance has a pending task (spawning). Skip.,260455e3,[instance: <> During sync_power_state the instance has a pending task <> Skip.,"['b9000564-fe1a-409b-b8cc-1e88b294cd1d]', '(spawning).']"

Spell O/p: 9,nova-compute.log.1.2017-05-16_13:55:31,2017-05-16,00:00:04.693,2931,INFO,nova.compute.manager,[req-3ea4052c-895d-4b64-9e2d-04d64c4d94ab - - - - -,[instance: b9000564-fe1a-409b-b8cc-1e88b294cd1d] During sync_power_state the instance has a pending task (spawning). Skip.,4691495f,[instance b9000564-fe1a-409b-b8cc-1e88b294cd1d] <> <> <> instance <>,"['', 'During sync_power_state the', 'has a pending task (spawning). Skip']"

Second question are you going to utilize parameter list in current models or you are going to use only in future purpose.

AIMLTest commented 5 years ago

@jimzhu We are working towards identifying anoamly position, but it will take some time.

Thanks, hopefully it will release at earliest. Can you please answer for above unanswered questions?

And also I think you have selected only EventTemplate feature for modeling and to find anomaly detection and ignored other features like Component, Time etc. May, I know how you have been decided EventTemplate is enough?.

As mentioned above, currently focusing on sequence model and in the documentation we can extend to sliding windows bgl_preprocess_data. could you please let me know how to call this method if i want to integrate in current models.

AIMLTest commented 5 years ago

@jimzhu . Sorry to spam, After the modifications now template has been increased drastically and can you please revise once again is it valid?

For your reference, please find the below file.

https://www.cs.utah.edu/~mind/papers/deeplog_misc.html -( Normal log dataset 1)

Could you please look into asap, above created templates are blocking for us. And is it possible to switch previous version(without parameter list)?

zhujiem commented 5 years ago

@jimzhu . Sorry to spam, After the modifications now template has been increased drastically and can you please revise once again is it valid?

For your reference, please find the below file.

https://www.cs.utah.edu/~mind/papers/deeplog_misc.html -( Normal log dataset 1)

Could you please look into asap, above created templates are blocking for us. And is it possible to switch previous version(without parameter list)?

@AIMLTest Could you please explain more about the details?

zhujiem commented 5 years ago

Which log parser did you use?

AIMLTest commented 5 years ago

@jimzhu . Sorry to spam, After the modifications now template has been increased drastically and can you please revise once again is it valid? For your reference, please find the below file. https://www.cs.utah.edu/~mind/papers/deeplog_misc.html -( Normal log dataset 1) Could you please look into asap, above created templates are blocking for us. And is it possible to switch previous version(without parameter list)?

@AIMLTest Could you please explain more about the details?

Previous version output template 7b60ac2e	[instance <> <> <> <> MB <> <> MB	1112

Current Version output template

db43fcd4	[instance: 78dc1847-<>-49cc-933e-9239b12c9dcf] <> <> <> MB, <> <>.<*> MB	2

With these changes earlier 44 templates are increased to 1700 templates.

I think these effects are due to regexp and i have used Drain logparser.

zhujiem commented 5 years ago

@jimzhu Thanks for modifications. Now, am able to run Spell_demo and Drain_demo program. I have used same regular expression for both programs and also same file. But quick question, both produces different event templates. is it expected? And how do you chose which logparser is need to use in openstack environment. For example, i have added two sample outputs.

Drain o/p: 9,nova-compute.log.1.2017-05-16_13:55:31,2017-05-16,00:00:04.693,2931,INFO,nova.compute.manager,[req-3ea4052c-895d-4b64-9e2d-04d64c4d94ab - - - - -,[instance: b9000564-fe1a-409b-b8cc-1e88b294cd1d] During sync_powerstate the instance has a pending task (spawning). Skip.,260455e3,[instance: <> During sync_powerstate the instance has a pending task <> Skip.,"['b9000564-fe1a-409b-b8cc-1e88b294cd1d]', '(spawning).']"

Spell O/p: 9,nova-compute.log.1.2017-05-16_13:55:31,2017-05-16,00:00:04.693,2931,INFO,nova.compute.manager,[req-3ea4052c-895d-4b64-9e2d-04d64c4d94ab - - - - -,[instance: b9000564-fe1a-409b-b8cc-1e88b294cd1d] During sync_powerstate the instance has a pending task (spawning). Skip.,4691495f,[instance b9000564-fe1a-409b-b8cc-1e88b294cd1d] <> <_> <_> instance <_>,"['', 'During sync_power_state the', 'has a pending task (spawning). Skip']"

Second question are you going to utilize parameter list in current models or you are going to use only in future purpose.

@jimzhu Thanks for modifications. Now, am able to run Spell_demo and Drain_demo program. I have used same regular expression for both programs and also same file. But quick question, both produces different event templates. is it expected? And how do you chose which logparser is need to use in openstack environment. For example, i have added two sample outputs.

Drain o/p: 9,nova-compute.log.1.2017-05-16_13:55:31,2017-05-16,00:00:04.693,2931,INFO,nova.compute.manager,[req-3ea4052c-895d-4b64-9e2d-04d64c4d94ab - - - - -,[instance: b9000564-fe1a-409b-b8cc-1e88b294cd1d] During sync_powerstate the instance has a pending task (spawning). Skip.,260455e3,[instance: <> During sync_powerstate the instance has a pending task <> Skip.,"['b9000564-fe1a-409b-b8cc-1e88b294cd1d]', '(spawning).']"

Spell O/p: 9,nova-compute.log.1.2017-05-16_13:55:31,2017-05-16,00:00:04.693,2931,INFO,nova.compute.manager,[req-3ea4052c-895d-4b64-9e2d-04d64c4d94ab - - - - -,[instance: b9000564-fe1a-409b-b8cc-1e88b294cd1d] During sync_powerstate the instance has a pending task (spawning). Skip.,4691495f,[instance b9000564-fe1a-409b-b8cc-1e88b294cd1d] <> <_> <_> instance <_>,"['', 'During sync_power_state the', 'has a pending task (spawning). Skip']"

Second question are you going to utilize parameter list in current models or you are going to use only in future purpose.

The logparser project is based on machine leraning. So any log parser cannot guarantee 100% accuracy. Different log parsers will have different accuracy values. So, Spell and Drain have different templates.

Currently, we only use event template for our model.

zhujiem commented 5 years ago

@jimzhu We are working towards identifying anoamly position, but it will take some time.

Thanks, hopefully it will release at earliest. Can you please answer for above unanswered questions?

And also I think you have selected only EventTemplate feature for modeling and to find anomaly detection and ignored other features like Component, Time etc. May, I know how you have been decided EventTemplate is enough?.

As mentioned above, currently focusing on sequence model and in the documentation we can extend to sliding windows bgl_preprocess_data. could you please let me know how to call this method if i want to integrate in current models.

It would be a good idea to add info like time, component, etc. The bgl loading function is not ready yet.

AIMLTest commented 5 years ago

@jimzhu Thanks for modifications. Now, am able to run Spell_demo and Drain_demo program. I have used same regular expression for both programs and also same file. But quick question, both produces different event templates. is it expected? And how do you chose which logparser is need to use in openstack environment. For example, i have added two sample outputs. Drain o/p: 9,nova-compute.log.1.2017-05-16_13:55:31,2017-05-16,00:00:04.693,2931,INFO,nova.compute.manager,[req-3ea4052c-895d-4b64-9e2d-04d64c4d94ab - - - - -,[instance: b9000564-fe1a-409b-b8cc-1e88b294cd1d] During sync_powerstate the instance has a pending task (spawning). Skip.,260455e3,[instance: <> During sync_powerstate the instance has a pending task <> Skip.,"['b9000564-fe1a-409b-b8cc-1e88b294cd1d]', '(spawning).']" Spell O/p: 9,nova-compute.log.1.2017-05-16_13:55:31,2017-05-16,00:00:04.693,2931,INFO,nova.compute.manager,[req-3ea4052c-895d-4b64-9e2d-04d64c4d94ab - - - - -,[instance: b9000564-fe1a-409b-b8cc-1e88b294cd1d] During sync_powerstate the instance has a pending task (spawning). Skip.,4691495f,[instance b9000564-fe1a-409b-b8cc-1e88b294cd1d] <> <_> <_> instance <_>,"['', 'During sync_power_state the', 'has a pending task (spawning). Skip']" Second question are you going to utilize parameter list in current models or you are going to use only in future purpose.

@jimzhu Thanks for modifications. Now, am able to run Spell_demo and Drain_demo program. I have used same regular expression for both programs and also same file. But quick question, both produces different event templates. is it expected? And how do you chose which logparser is need to use in openstack environment. For example, i have added two sample outputs. Drain o/p: 9,nova-compute.log.1.2017-05-16_13:55:31,2017-05-16,00:00:04.693,2931,INFO,nova.compute.manager,[req-3ea4052c-895d-4b64-9e2d-04d64c4d94ab - - - - -,[instance: b9000564-fe1a-409b-b8cc-1e88b294cd1d] During sync_powerstate the instance has a pending task (spawning). Skip.,260455e3,[instance: <> During sync_powerstate the instance has a pending task <> Skip.,"['b9000564-fe1a-409b-b8cc-1e88b294cd1d]', '(spawning).']" Spell O/p: 9,nova-compute.log.1.2017-05-16_13:55:31,2017-05-16,00:00:04.693,2931,INFO,nova.compute.manager,[req-3ea4052c-895d-4b64-9e2d-04d64c4d94ab - - - - -,[instance: b9000564-fe1a-409b-b8cc-1e88b294cd1d] During sync_powerstate the instance has a pending task (spawning). Skip.,4691495f,[instance b9000564-fe1a-409b-b8cc-1e88b294cd1d] <> <_> <_> instance <_>,"['', 'During sync_power_state the', 'has a pending task (spawning). Skip']" Second question are you going to utilize parameter list in current models or you are going to use only in future purpose.

The logparser project is based on machine leraning. So any log parser cannot guarantee 100% accuracy. Different log parsers will have different accuracy values. So, Spell and Drain have different templates.

Currently, we only use event template for our model.

@jimzhu As you mentioned different log parser will have different accuracy values and how to chose which logparser need to use according to the use case. In Drain logparser few are added regular expresssions with and few instances listed directly with instance number and IPLom logparser instances are . so how do we chose correctly, Do you mean we need to experiment with all logparsers and test accuracy based on that we should fix logparser for usecases?

zhujiem commented 5 years ago

According to our benchmarking results, Drain currently has the best accuracy. You can use Drain and tune the parameters for different types of logs.

zhujiem commented 5 years ago

@jimzhu . Sorry to spam, After the modifications now template has been increased drastically and can you please revise once again is it valid? For your reference, please find the below file. https://www.cs.utah.edu/~mind/papers/deeplog_misc.html -( Normal log dataset 1) Could you please look into asap, above created templates are blocking for us. And is it possible to switch previous version(without parameter list)?

@AIMLTest Could you please explain more about the details?

Previous version output template 7b60ac2e [instance <_> <_> <_> <_> MB <_> <_> MB 1112

Current Version output template

db43fcd4 [instance: 78dc1847-<_>-49cc-933e-9239b12c9dcf] <_> <_> <_> MB, <_> <_>.<*> MB 2 With these changes earlier 44 templates are increased to 1700 templates.

I think these effects are due to regexp and i have used Drain logparser.

I cannot see any major difference between the two versions. Could you provide the running scripts with parameter settings?

AIMLTest commented 5 years ago

According to our benchmarking results, Drain currently has the best accuracy. You can use Drain and tune the parameters for different types of logs.

@jimzhu Sure thanks for update. But still one basic question about logparser output and for your reference i pasted two templates and didn't understand logic, for one template it created as * for instance number and second template it listed actual instance value..?

fcc9fd98 [instance: <*> Terminating instance 556

7c361568 [instance: 78dc1847-8848-49cc-933e-9239b12c9dcf] Total <> <> <> used: 0.00 <> 2

logpai / loglizer

Find the position of the anomalies in the log file #32