Closed bripkens closed 9 months ago
The logparser is aimed to identify the structure in the \<Content>, since the format you provided is already easy to define by a line of regrex. It is unnesseary to use other tools to do so.
Interesting. In my space, it is common to have raw log lines of unknown format – and consequently, no regular expression to parse those log lines.
Hence, the image in the readme caused this confusion. From my perspective, it is showing (at least) two steps:
I am not proposing that you change anything. Just making you aware of step 1) which is a continuous industry challenge :)
The repository's readme makes it seem like the benchmarks evaluate both parsing to structured logs and template identification.
An example of log parsing
However, when looking at the actual benchmarks, it can be seen that the implementations work based on known log formats per input file, e.g., in the Brain benchmark.
https://github.com/logpai/logparser/blob/18dcd312d72173e1f19ef59a8155c77c93c74f2d/logparser/Brain/benchmark.py#L32
Am I missing some details, or do the algorithms really not concern themselves with the automatic identification of common logging fields, i.e., the conversion to structured logs without any prior knowledge? To clarify, I expected a native notion of time, severity etc. Which arguably requires some defined semantics that may or may not be shared across logs.
This is not meant to question the value of this project or the algorithms. I only mean to resolve my expectation mismatch :)