sematext / sematext-agent-docker

Sematext Docker Agent - host + container metrics, logs & event collector
https://sematext.com/docker
Apache License 2.0
209 stars 30 forks source link

Full stack trace not being ingested into sematext when using sda #46

Closed MGD1981 closed 6 years ago

MGD1981 commented 6 years ago

The following log only had its first two lines (up to and including "java.lang.NullPointerException: null" but nothing after that) ingested into sematext docker agent, without the rest of the stack trace (and these were ingested as two separate log entries):

18/07/18 20:57:49,248 ERROR [ProcThread-11] EventProcessingThread - Exception occured while processing rule:5ae0fe5711048b00073572d1 on event:Eve
nt [id=null, companyID=97b47e4d-578a-41cd-a396-ce8fc94c869f, environment=production, entityType=businessDocument, bodID=Amazon_856_000007195_856_
40116, systemID=null, timestamp=Wed Jul 18 20:57:44 GMT 2018, type=advance_ship_notice_received, description=Incoming document is processed, data
=null, laVersion=4, meta={}]
java.lang.NullPointerException: null
        at com.coenterprise.syncrofy.rules.ParameterEvaluationUtils.evaluate(ParameterEvaluationUtils.java:577) ~[app.jar:?]
        at com.coenterprise.syncrofy.rules.EventProcessingThread.evaluateParameters(EventProcessingThread.java:286) ~[app.jar:?]
        at com.coenterprise.syncrofy.rules.EventProcessingThread.applyRule(EventProcessingThread.java:190) ~[app.jar:?]
        at com.coenterprise.syncrofy.rules.EventProcessingThread.run(EventProcessingThread.java:132) [app.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_151]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]

I have attached my patterns.yml (compressed into a zip for github) which does contain a custom pattern for our backend service which was derived for us by Radu Gheorghe a few months ago to allow that service's logs' severity to be parsed (lines 12-20 in the file).

patterns.zip

megastef commented 6 years ago

Hi, I could imagine writing the log was very slow. Is this one case or can you reproduce the issue with a log file? Could you please set the environment variable LOGAGENT_MULTILINE_TIMEOUT_MS=350 (the default value is 250 ms). LOGAGENT_MULTILINE_TIMEOUT_MS influences the time in milliseconds, the multiline parser waits for new input.

MGD1981 commented 6 years ago

Thanks for the suggestion -- unfortunately we have not yet been able to replicate in a consistent fashion, so have not been able to test. But I do have a question about the default value for LOGAGENT_MULTILINE_TIMEOUT_MS: what are the negative consequences of setting this value too high? What might happen if, for example, I set it to a full 1 second?

megastef commented 6 years ago

The agent might consume a bit more memory because more messages are buffered for a longer time. As long you don't have a very high log volume, it should not be a problem. You might notice a delay of one second before the message is processed. If you don't parse the timestamp from the log, you might see a difference in the timestamp (but I'm not 100% sure about that as we have N places dealing with timestamps).

megastef commented 6 years ago

Could you share a full stack trace? As far I see the patterns for "backend" have no blockStart definitions for multi-line parser. Therefore the default separator for multi-line is used, which covers only indentation of log messages.

multiline:
  defaultSeparator:  ^\S{2,}

If the stack trace has somewhere no tab/space as first 2 characters it might be broken down into 2 messages.

For example you could add the blockStart definition in line 13 to split log messages by the "date" of each log line:

# split messages in blocks starting with a date like "18/07/18 20:57:49"
blockStart: !!js/regexp \d{2}/\d{2}/\d{2}\s\d{2}:\d{2}:\d{2}
MGD1981 commented 6 years ago

The example in the first message was the full stack trace, including the multiple spaces preceding each line after the first one. The default separator should pick that up, yes? Your sharing the blockStart definition may be very valuable though. Once we are able to consistently replicate we will test it!

MGD1981 commented 6 years ago

I am still not sure what happened in my original case. I have tried to fake replication by creating a container which prints out the contents of the original stack trace every 30 seconds, and setting LOGAGENT_MULTILINE_TIMEOUT_MS to 1 (can it go lower?) but it is still not cutting off the multilines like we saw in our production instance. In the meantime I suppose we should close the ticket and I'll let you know if I am ever able to replicate. Thanks for your help!

otisg commented 6 years ago

Alright, let us know then :)

MGD1981 commented 6 years ago

Just FYI, the blockStart config you gave me has definitely fixed the issue, and we are getting full stack traces now all in the same log line -- just had a tweak it a little (adding escape characters for slashes, colons). Thanks again!

otisg commented 6 years ago

Awesome, thanks for following up!