multiline is catching the one-line logs

Zava2012 commented 2 years ago

Describe the bug multiline block is catching all of the logs including the one-line log. I'll provide an example in the next section to clarify all of it.

To Reproduce There are two blocks of logs structure. They were taken from the real application. It means all timestamps and strings exactly repeat logs from the application. The first block of logs (this is one-line log):

[2022-02-08 18:49:07,664] INFO ...
[2022-02-08 18:50:07,664] ERROR ...
[2022-02-08 18:51:07,664] INFO ...

The second block of logs (stacktrace for Python):

[2022-02-08 18:49:07,664] INFO ...
[2022-02-08 18:50:07,664] ERROR ...
Traceback ...
...
[2022-02-08 18:51:07,664] INFO ...

On the Promtail side I have this simple configuration to catch stacktraces of Python applications. You can find almost the same configuration in official documentation

- docker: {}
- multiline:
    firstline: '^\[\d{2,4}\-\d{2,4}\-\d{2,4} \d{2,4}\:\d{2,4}\:\d{2,4}\,\d{1,6}\]'
    max_lines: 256
- regex:
    expression: '^\[(?P<time>\d{2,4}\-\d{2,4}\-\d{2,4} \d{2,4}\:\d{2,4}\:\d{2,4}\,\d{1,6})\] (?P<severity>[A-Z]*) in (?P<component>[^ ]*): (?P<error>(?s:.*))$'
- labels:
    severity:
    error:

And all INFO and ERROR logs are written to the "error" label.

Expected behavior As far as I understand, "firstline" field in the multiline block says if the next line of the log doesn't match a regex in "firstline" field then "multiline" block concatenate the next log line with a previous line of the log. The next lines of logs work in the same way and form an extracted map that can be caught by the next "regex" block without catching all of the logs via "regex" block.

How can I write only stacktraces to the label without one-line ERROR and INFO logs?

Environment:

Infrastructure: Kubernetes (AWS EKS)
Deployment tool: Helm
Loki mode: Microservice
Version of Promtail and Loki: 2.4.2

jortkoopmans commented 2 years ago

Hi @Zava2012 , just stumbled across your ticket while looking for multiline processing myself (somewhat similar situation). Not an expert, but my guess is that you're observing your regex not matching the multiline logs.

To clarify, expecting these steps of processing;

Convert your docker logs, which are json, extracting the timestamp and stream labels. See also locally in your promtail containers under /var/log/pods/*.
Multiline stage does as you describe, but all messages are passed on to further stages (whether touched by multiline or not).
in regex stage, loglines which are single-line match this statement (as you have ^ and $), while multiline does not.
Only single-line loglines then have the extracted map which you can convert into labels.

You could try to modify your regex to also work on multiline logs with a robust extracted map, but it might prove tricky.

For reference, I'm trying something similar with a regex, but then using the &ZeroWidthSpace character, using a match block and a custom extracted label to match on.

      - docker: {}
      - multiline:
        # Use this special stage to combine specifically tagged lines into one multiline message
          firstline: '^&ZeroWidthSpace;'
          max_wait_time: 3s
          max_lines: 192
      - regex:
        # This regex adds the 'multiline' content to the extracted map
          expression: '^(?s)(?P<multiline>&ZeroWidthSpace;)'
      - labels:
        # Now label messages that have a 'multiline' in their extracted map
          multiline:
      - match:
        # Only filter messages that are multiline labelled
          selector: '{multiline="&ZeroWidthSpace;"}'
          stages:
            - replace:
              # Clean up zerospacewidth
                expression: '^(?s)(?P<zerowidthspace>&ZeroWidthSpace;)'
                replace: ''
            - replace:
              # Remove empty lines
                expression: '(?m)(?P<emptyline>^\s*\n)'
                replace: ''

Hope it helps! 🤞

jortkoopmans commented 2 years ago

To clarify, without a full example logline (which you have not posted), I can't be sure your regex would actually work (but you can check yourself). Either way, the multiline stage reviews all your logs including single line and everything is passed to your regex stage. So this is probably what needs your attention.

Zava2012 commented 2 years ago

Hello @jortkoopmans! Thanks for your detailed feedback with implementation!

I thought about a similar approach but unfortunately, this is not an option for me.

I checked my regex many times and as I described in an issue: And all INFO and ERROR logs are written to the "error" label. But I want to write only extracted map from multiline block and not every line of log. So, regex is working correctly if we are talking about matching.

And want to describe a little bit why I've created an issue. To do that I will quote some sentences from documentation: 1) A new block is identified by the firstline regular expression. Any line that does not match the expression is considered to be part of the block of the previous match. 2) We would like to collapse all lines of the traceback into one multiline block. In this example (from documentation), all blocks start with a timestamp in brackets. Thus, we configure a multiline stage with the firstline regular expression ... . This will match the start of the traceback, but not the following lines until Exception: Sorry, this route always breaks. These will be part of a multiline block and one log entry in Loki. 3) Flag (?s:.*) needs to be set for regex stage to capture full traceback log in the extracted map.

All of these sentences an expected behavior described in this issue

jortkoopmans commented 2 years ago

Right, I agree that the documentation is somewhat ambiguous on this. The main takeaway is that a single line log still starts with the timestamp that initiates the multiline stage. How I would loosely describe the process;

multiline starts upon hitting the firstline marker
multiline stops aggregating upon hitting the next firstline marker, merging everything before it into 1 message (with all caveats of max lines and timeouts omitted for simplicity)

That is different from an interpretation of doc "These will be part of a multiline block and one log entry in Loki" as some sort of multiline evaluation (meaning ONLY IF it is multiline, it will merge into one entry).

If we're talking about solutions (I'm having a similar challenge link), one solutions might be to have the multiline stage itself populate an extracted map and/or label to indicate it indeed is multiline and merged multiple lines. Then it is easy to perform subsequent matching/regex (etc) stages just on those.

jeschkies commented 2 years ago

@Zava2012, I wrote the multiline stage. I'm not sure yet if I understand it correctly. Do yo expect single lines to be excluded from multiline blocks?

jortkoopmans commented 2 years ago

@jeschkies ; Can't speak for @Zava2012 , but this is also how I interpreted it. While I have your attention 😇 , do you have any ideas on how to address the artefacts of using your multiline pipeline stage here?

stale[bot] commented 2 years ago

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

Mark issues as revivable if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed).
Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

jeschkies commented 2 years ago

@jortkoopmans sorry for the late reply. Did you get multiline support working for you?

jortkoopmans commented 2 years ago

@jeschkies ; Thanks for your response anyway :smiley: . I sort of got it working, but not in full. Basically I'm left with these issues;

Long messages (>192 lines) are split, with all messages except the first having empty lines, every other line.
I lose original empty lines logged by applications, as I can’t distinguish between original ones and the empty lines produced by the multiline stage.

See details of what I did here; https://community.grafana.com/t/promtail-stages-docker-and-multiline/60653

Zava2012 commented 2 years ago

Hello everyone! Excuse me that I went missing.

@Zava2012, I wrote the multiline stage. I'm not sure yet if I understand it correctly. Do yo expect single lines to be excluded from multiline blocks?

Yes, exactly! Actually, the stage name consists of the word "multi". It's confusing when this stage catches single lines.

@jeschkies ; Can't speak for @Zava2012 , but this is also how I interpreted it. While I have your attention 😇 , do you have any ideas on how to address the artifacts of using your multiline pipeline stage here?

I just added the following stage for my applications and it works for Docker CRI:

- replace:
    expression: '(\n)'
    replace: ''

Fortunately, I have one programming language. Then trying to catch the severity level of logs and filter them based on unique word of multilne errors like this for Python:

- match:
    selector: '{container=~"app-.*", severity="ERROR"} |= "Traceback"'
    stages:
      - regex:
          expression: '^(?P<error>(?s:.*))$'

This is totally not a good solution but it works in my case.

cstyan commented 11 months ago

@jeschkies is there something we should change with the multiline pipeline here?

jeschkies commented 8 months ago

@cstyan it's been a while but I think so. Well, it should change in the Grafana Agent.

grafana / loki

multiline is catching the one-line logs #5365