logpai / loglizer

A machine learning toolkit for log-based anomaly detection [ISSRE'16]
MIT License
1.27k stars 423 forks source link

InvariantsMiner Optimisation #95

Open gutjuri opened 2 years ago

gutjuri commented 2 years ago

For datasets with a large number of log keys, InvariantsMiner has been exceptionally slow. I performed tests with a linux syslog dataset (415 log keys) and fitting times have been unbearable.

I profiled InvariantsMiner and detected that the (by far) largest amount of time is spent in the method _join_set. I optimised this method in order to reduce its computational complexity.

Now, runtimes are considerably better for linux syslogs. For HDFS logs, runtimes didn't change.