logpai / logparser

A machine learning toolkit for log parsing [ICSE'19, DSN'16]
Other
1.61k stars 555 forks source link

The difference in number of records between raw data and parsed data #100

Closed dino-chiio closed 1 year ago

dino-chiio commented 1 year ago

Hi. I am studying your implementation for the Drain demo with the BGL dataset (full version).

However, the parsed dataset has a number of samples less than the raw dataset. While the raw dataset has 4,747,963 records, the parsed dataset has only 4,713,493 samples.

Could you please explain to me the reason for this issue?

zhujiem commented 1 year ago

There are some lines that are skipped because they cannot match the log format in config.