Closed anton-ryzhov closed 4 years ago
Thanks @anton-ryzhov, I was actually just discussing this exact problem on Docker with @MOZGIII. @MOZGIII what do you think about taking this as your next issue?
Right, this is exactly the problem we discussed. I can work on it.
message_start_indicator = '\\n"'
is a smart trick, however, the proper fix for this issue should probably be implemented on a different level, to ensure it works with gelf
driver too.
@binarylogic @MOZGIII good to know.
Docker problem is more complex in the end. So I just started with this more universal feature request of complementary message split. It could be useful not only for docker.
Speaking about docker (seems it should be a separate ticket) — I feel this should be handled on the next level, as transform. But for doing this we need to split messages properly at "source" level. Detecting by '\\n"'
looks like dirty hack for me.
Another reason to merge docker logs at transform step — docker logs could be read from journald
and docker
sources. And same logic should be applied. But we can't merge two events at transform step.
Is it possible to have transformation which will have access to current and previous events [from the same source] and will be able to keep/delete/replace them. Kind of reducing function & sliding window.
So I just started with this more universal feature request of complementary message split.
Sounds good!
I was thinking about solving the particular case with docker where the inner log messages are JSON - in that case, it'd be possible to just use a streaming JSON parser and join across multiple messages until the parser completes or reaches an unexpected token. It may be possible to solve this issue more generically, so having a detailed feature request in the first place would be very valuable.
I want to share my approach/workaround. Doesn't work as expected because of this ticket, but maybe it will help you to come up with a better solution.
[sources.docker_logs]
type = "file"
include = ["containers/*/*.log"]
message_start_indicator = '\\n"'
# Should be:
# message_stop_indicator = '\\n"'
[transforms.newlines_to_array]
type = "lua"
inputs = ["docker_logs"]
source = """
event["message"] = "{\\"chunks\\": [" .. string.gsub(event["message"], "\\n", ",") .. "]}"
"""
[transforms.json_parser]
type = "json_parser"
inputs = ["newlines_to_array"]
[transforms.merge_chunks]
type = "lua"
inputs = ["json_parser"]
source = """
event["time"] = event["chunks[0].time"]
event["stream"] = event["chunks[0].stream"]
event["log"] = ""
local i = 0
while event["chunks[" .. i .. "].time"] do
event["log"] = event["log"] .. event["chunks[" .. i .. "].log"]
event["chunks[" .. i .. "].time"] = nil
event["chunks[" .. i .. "].log"] = nil
event["chunks[" .. i .. "].stream"] = nil
i = i + 1
end
"""
Now, when #1504 landed, this should be possible to achieve what you're looking for here via a merge
transform.
We'll continue to look into this issue. We might integrate the merge
transform into the file
source to provide an built-in solution.
When there is a universal merge
transform — maybe sources shouldn't do that in general (but it's 100% reasonable for docker
source).
So maybe idea of that issue is outdated now.
It's an ongoing process to determine what sources should support built-in merging, and how.
With build-in merges, we looking for the following:
docker
is an obvious example, same goes for kubernetes
; journald
is not so trivial though, we can discuss it at #1719 if you're interested;docker
or journald
) in the source internals those streams can be accessed individually with less overhead and reliability compared to using the stream_discriminant_fields
; so, in those cases, we can provide an easier to understand configuration interface for the end-users - even if merging is not enabled by default for a particular source.So, I guess this completes the list of items that we consider when we consider implementing merging at the source. I am under the impression that we definitely can't cover all possible needs by merging at sources, so we need a merge transform.
The current implementation has a design trait that will stash the partial events and won't release them until it can provide a merged one. Configuring this can be error-prone at times and can cause the transform to consume all system memory if used incorrectly. Another issue is that when the partial event is read from the source and gets stashed at the merge transform, the source is not aware that the event is not processed, and will not redeliver it in case vector restarts. This is the main problem we see with the design, and the reason we're currently only shipping the transform as a beta.
We probably should add it to the documentation, so people are aware.
We'll be adding some new parameters to file
source, taking inspiration from https://app.scalyr.com/help/parsing-logs#multiline
In particular, in addition to message_start_indicator
, we'll add new parameters, based on how Scalyr does it:
continueThrough: all consecutive lines matching this pattern are included in the group. The first line (the line that matched the start pattern) does not need to match the continueThrough pattern. This is useful in cases such as a Java stack trace, where some indicator in the line (such as leading whitespace) indicates that it is an extension of the preceeding line.
continuePast: all consecutive lines matching this pattern, plus one additional line, are included in the group. This is useful in cases where a log message ends with a continuation marker, such as a backslash, indicating that the following line is part of the same message.
haltBefore: all consecutive lines not matching this pattern are included in the group. This is useful where a log line contains a marker indicating that it begins a new message.
haltWith: all consecutive lines, up to and including the first line matching this pattern, are included in the group. This is useful where a log line ends with a termination marker, such as a semicolon.
That makes sense to me. While we're adding these options, would it be possible to nest them under a new multiline
table? I would like to keep the current message_start_indicator
option, but simply remove it from the docs. Ex:
[sources.in]
type = "file"
[sources.in.multiline]
start = "..."
continue_through = "..."
continue_past = "..."
halt_before = "..."
halt_with = "..."
What do you think? Also, notice the name is snake case.
I like the nesting! I think it'll map to the internal implementation better than the way we're doing it right now. Regarding the exact config options set and names - definitely snake case, but having it as Scalyr does is a bit confusing - you have to specify a start pattern + exactly one of the continuation patterns. It'd be easier for everyone if we just have three fields:
start
continuation
mode
- an enum of continue_through
/continue_past
/halt_before
/halt_with
Do you like this?
I’m a fan. The ‘mode’ option looks good to me.
But for halt
mode it is not a continuation
.
Don't just copy their config, you can make it much more intuitive.
continue
and halt
could be defined by mutual exclusive pattern definitions without explicit mode.
For continueThrough
:
start = "^Error"
through = "^ at"
For haltWith
:
start = "^Error"
stop = "Summary line"
haltBefore
mode sounds for me like start
pattern of the next message. If you met start
again — just start a new event.
continuePast
mode for backslash example could be stop="[^\\]$"
Thanks, @anton-ryzhov!
I've started working on the implementation, and I too noticed that we can probably do it even better than with just mode
. I'm working with the mode
field for now, and I think the better configuration parameters will emerge in the process.
Your point is definitely valid, and it was an overlook at the initial design pass. What you're proposing looks better, and I'll consider modeling the configuration like that - cause it seems to also be a better fit for the implementation needs.
By the way, I've added a couple of real-world test cases, and I'm looking to add more to make our implementation easier to maintain.
For example, for Java stack traces I have this:
let lines = vec![
"java.lang.Exception",
" at com.foo.bar(bar.java:123)",
" at com.foo.baz(baz.java:456)",
];
let config = Config {
start_pattern: Regex::new("^[^\\s]").unwrap(),
condition_pattern: Regex::new("^[\\s]+at").unwrap(),
mode: Mode::ContinueThrough,
// ...
};
let expected = vec![
concat!(
"java.lang.Exception\n",
" at com.foo.bar(bar.java:123)\n",
" at com.foo.baz(baz.java:456)",
),
];
run_and_assert(&lines, config, &expected);
I also have a use case test for Ruby stack traces.
If you'd like to, now would be a good time to share the real-world usage examples that are important to you - and we add them as test cases too!
Hey @anton-ryzhov, a follow up after we merged the PR! We ended up going with start_pattern
, condition_pattern
, mode
and timeout_ms
fields.
I'll continue with a follow-up issue on improving the configuration fields.
In addition to
message_start_indicator
, opposite parameter can be very useful.For example, docker json log driver splits long lines by 16k. And only very last chunk contains
\r\n
.By defining
message_start_indicator = '\\n"'
I'm getting "Second..." merged to first line rather than to its continuation.So only difference is how to combine unmatched lines — to previous or to next matched line.