logpai / logparser

A machine learning toolkit for log parsing [ICSE'19, DSN'16]
Other
1.61k stars 555 forks source link

The regex is different #87

Closed LeeWangWang closed 1 year ago

LeeWangWang commented 2 years ago

The regex in Drain_demo.py is

regex = [ r'blk_(|-)[0-9]+' , # block id r'(/|)([0-9]+.){3}[0-9]+(:[0-9]+|)(:|)', # IP r'(?<=[^A-Za-z0-9])(-?+?\d+)(?=[^A-Za-z0-9])|[0-9]+$', # Numbers ]

but the regex in Drain_benchmark.py is

'regex': [r'blk_-?\d+', r'(\d+.){3}\d+(:\d+)?']

I wonder why

zhujiem commented 2 years ago

Demo file is for test only. Pls refer to the benchmark file for accuracy numbers.

LeeWangWang commented 2 years ago

when I run benchmark on The Windows dataset, the results are different than your Windows_2k.log_templates.csv

This is mine: (1)Loaded Servicing Stack v6.1.7601.23505 with Core: C:\Windows\winsxs\amd64_microsoft-windows-servicingstack_31bf3856ad364e35_6.1.7601.23505_none_681aa442f6fed7f0\cbscore.dll" (2)<> WcpInitialize (wcp.dll version 0.0.0.6) called (stack <>

This is yours: (1)Loaded Servicing Stack <> with Core: <>\cbscore.dll (2)*>@<*/<>/<>:<>:<>:<>.<> WcpInitialize (wcp.dll version <>) called (stack @<>)

JinYang88 commented 1 year ago

Different methods can produce different results, may I know which algorithm you are using?