logpai / loglizer

A machine learning toolkit for log-based anomaly detection [ISSRE'16]
MIT License
1.27k stars 423 forks source link

Invariant miner taking hours to run #65

Closed Rufaida94 closed 4 years ago

Rufaida94 commented 4 years ago

Thank you for the invariant miner, it is a great tool for anomaly detection. However, I have one issue with it:

when running it with some log files from different sizes (it works with some log files and does not work with others and this is not dependent on the size of the log file) sometimes it gets stuck ( I leave it running for hours and still no result).

it seems to be stuck in this position: ====== Model summary ====== Invariant space dimension: 17

so it estimates the Invariant space dimension but fails to produce the actual invariant.

when I try to deubg I find that this is the point in the code that is taking so long:

Traceback (most recent call last): File "InvariantsMiner_demo_without_labels.py", line 27, in model.fit(x_train) File "../loglizer/models/InvariantsMiner.py", line 44, in fit self._invariants_search(X, invar_dim) File "../loglizer/models/InvariantsMiner.py", line 132, in _invariants_search joined_item_list = self._join_set(item_list, length) # generate new invariant candidates File "../loglizer/models/InvariantsMiner.py", line 264, in _join_set if joined not in return_list:

Any idea why? and how can I fix this?

ShilinHe commented 4 years ago

Hi there, since it already estimates the space dimension, it should go into the function _invariants_search. According to your program trace, it should be the problem of join_set, I think you should check it step by step for the _join_set function and see if it is stuck in some loops or cannot break some terminate condition.

Rufaida94 commented 4 years ago

I tried tracing it step by step but with no luck. Can I send you an example of a log file that produces this error? and maybe you can try it on your side and let me know if you notice something.

ShilinHe commented 4 years ago

Yes, if you don't mind. Please send the data and your trace to my email. I will check it. Thanks!

Rufaida94 commented 4 years ago

Great, thanks. I appreciate the help!

zhujiem commented 4 years ago

Great, thanks. I appreciate the help!

Could you print out the invariant space? If it is large, it would be normal to take a long time, because invariant mining search in a combination space with part of prunning.

ShilinHe commented 4 years ago

After receiving the raw data from the issue raiser, we have resolved the problem. The issue mainly comes from the data itself, which contains too few invariants. Invariants miner keeps searching and makes the search space explode. Therefore, two suggestions are given: 1. decrease the epsilon value 2. constrain the invariant length (number of templates in the invariants) such as 3 or 4.