Open ppf2 opened 8 years ago
@ppf2 - so I guess what you are saying is:
In any one identity based stream of lines there exists identifiable sub-streams within.
Clarification: ATM, the file input will map a codec instance to a path (identity). With a sub-stream pattern, from your example, "\s[.+]\s\s" that can be applied to each line to extract a sub-stream identity so different buffers/logic can be used for each sub-stream.
WDYT?
Would each sub-stream need its own pattern, what and negate settings?
I have been experimenting with a finite state machine class that could be used for each sub-stream.
/cc @jordansissel
More specific (sub) stream identity patterns (similar to the multiline filter) definitely is useful for me.
Say you have a log file that has mixed multiline logs and single line logs. Here is an example of one where there there are multiline logs (app java stack exception) and singleline logs (access log) mixed in the same log file.
If you use a multiline pattern to match for the multiline, it will correctly emit the multiline events, but will skip the single line entries entirely.
One workaround today is to not use the multiline codec but use the multiline filter in the LS pipeline. But with the multiline filter being deprecated soon, it is recommended to have a solution that uses the multiline codec. Maybe there is a way to provide a skip_pattern configuration so users can define another pattern to match for lines (the single lines) that should be skipped over so that these lines will still be generated as events even if the multiline codec is used.