For datasets with a large number of log keys, InvariantsMiner has been exceptionally slow.
I performed tests with a linux syslog dataset (415 log keys) and fitting times have been unbearable.
I profiled InvariantsMiner and detected that the (by far) largest amount of time is spent in the method _join_set. I optimised this method in order to reduce its computational complexity.
Now, runtimes are considerably better for linux syslogs.
For HDFS logs, runtimes didn't change.
For datasets with a large number of log keys, InvariantsMiner has been exceptionally slow. I performed tests with a linux syslog dataset (415 log keys) and fitting times have been unbearable.
I profiled InvariantsMiner and detected that the (by far) largest amount of time is spent in the method
_join_set
. I optimised this method in order to reduce its computational complexity.Now, runtimes are considerably better for linux syslogs. For HDFS logs, runtimes didn't change.