Feature: multi-line logs

ventris commented 5 years ago

Currently when an multi-line event is written to a logfile promtail will take this as each row is its own entry and send the separately to loki.

It would be great if Loki could handle multi-line events due to stacktraces.

This might also be a bug if promtail already should handle multi-lines.

tomwilkie commented 5 years ago

Agreed, thanks @ventris. Any idea how to do this though? I guess we could have the option to parse the log entries in promtail, and define each entry as starting with a time stamp?

ventris commented 5 years ago

Some applications require an "special" character before the multi-line event. I know that our current Wildfly setup puts #### before the multiline event.

But I think that parsing the log entries somehow and cut each entry with the timestamp as key is the better solution.

ventris commented 5 years ago

Splunk handles this in a smart way.

Source: extract-timestamps in splunk

Maybe somethink like that?

One thing to take into account with only look for timestamps is that they can occur within a stacktrace. That is why Splunk has a variable that you specify how many characters into a line it should look for the timestamp.

sandstrom commented 5 years ago

Here is another multi-line scenario

Our log files are JSONL (http://jsonlines.org/) and look like this:

{ "timestamp" : "2018-12-24T18:00:00", "user-id" : "abc", "request-id" : "123", "messages" : ["Parsing new user data", "Saving user data, valid model", "Storing user data in db"] }
{ "timestamp" : "2018-12-24T18:01:00", "user-id: " "def", "request-id" : "456", "messages" : ["Updating user record", "Updating search index"] }
{ "timestamp" : "2018-12-24T18:02:00", "user-id: " "abc", "request-id" : "789", "messages" : ["Keep alive ping from client", "Updated client with cluster state details"] }

We'd like to (1) associate the key-value metadata with each log line, (2) ingest the array of log lines from the "messages" key and (3) ideally group the associated log lines (so that a log reader client could easily show them together).

If it makes ingestion easier, we could easily keep a timestamp per message in the array instead. Something like this:

{ "timestamp" : "2018-12-24T18:02", "user-id: " "abc", "request-id" : "789", "messages" : [{ "t" : 2018-12-24T18:02:10", "m" : "Keep alive ping from client" }, { "t" : 2018-12-24T18:02:10", "m" : "Updated client with cluster state details" }] }

Also, Loki sounds awesome! 🏆

Have been looking for a lightweight alternative to the ELK stack for logging. We could easily live without full-text search inside log messages (distributed grep would be enough).

However, we'd need filtering by keywords and multi-line log handling to make it work.

tomwilkie commented 5 years ago

We recently merged support for having multiple entries with the same timestamp; when running the docker log parser, multi-line logs usually show up as entries with the same timestamp, and are therefore grouped together. Not a perfect solution, but helps.

m0wfo commented 5 years ago

When I was at Logentries we delimited on \r, \n, e.t.c but allowed loggers to concat multiple lines using the unicode line separator. The indexer would ignore it and treat the entire payload as one entry (it was then up to the UI to optionally reformat the code onto multiple lines). That's one version of an imperfect solution anyway :)

DanCech commented 5 years ago

My understanding is that loki doesn't treat the content of log "lines" specially, so it should be up to the process tailing the logfile to decide when a single loki log "line" should actually contain multiple newline-delimited "lines" of content (maybe it would have made more sense to call it a log entry but such is life).

Of course I may be misunderstanding the intent here, but it seems like something like a backtrace would be more useful as a single entry in loki rather than being split up.

davidhao3300 commented 5 years ago

From what I've seen in other similar log tailing (fluentd concat plugin), the tailer provides options for multi-line start regexp (lines matching the regex indicate the start of a new log entry, so flush the old one out), with a timeout/max buffer size (if x lines end up in the buffer without a new log entry appearing, flush the logs out anyways).

I implemented support for the multi-line start regex for a hackathon, and it seems to work decently. I'll try to clean up the code over the weekend and open a PR to show as an example. I didn't implement any sort of timeout/max buffer size though.

davidhao3300 commented 5 years ago

Opened a PR for this issue

steven-sheehy commented 5 years ago

Note that multi-line is different depending upon the EntryParser. With CRI, it will send multiple entries with different flags to indicate if it's a partial log or a full log. See this document.

cyriltovena commented 5 years ago

Currently investigating this.

cyriltovena commented 5 years ago

I looked at it #399. There would be a lot of refactoring involved if we want to get this feature in the current pipeline, I would love to hear more about how this is useful before getting started. Basically how merging lines together helps is it a visual problem? Or is to improve storage of those ? Good example of workload emitting multiline logs ?

I want to know a bit more.

ventris commented 5 years ago

Java and C# stacktraces are multiline.

One of the problems with not having access to the full event is that it makes it much harder to search for other similar events because that the linebreaks can cut javaclasses in half and thus does not paint a real picture of what happened in the logs on the server.

Another thing is if the example below is handled with multiple logs instead it is really hard to search for say:

I know that these two keywords should be present somewhere in the stacktrace. Say com.example.myproject.MyProjectServletException and javax.servlet.ServletException: Something bad happened.

If I search for them withing having the stacktrace in one entry I will just get the rows that matched my keywords and not the whole stacktrace. This makes troubleshooting java much harder with Loki if multiline events are handled as-is. Because I first need to search for the keywords and then try to find the first and last entry of the stacktrace so I can read it as a whole. And the stacktraces can be several hundred lines.

I think there are other multiline events that should benefit from having it all in the same entry, but I have only worked with Java that behaves like that so I can not talk for other languages.

Here is an example stacktrace from java:

javax.servlet.ServletException: Something bad happened
    at com.example.myproject.OpenSessionInViewFilter.doFilter(OpenSessionInViewFilter.java:60)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
    at com.example.myproject.ExceptionHandlerFilter.doFilter(ExceptionHandlerFilter.java:28)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
    at com.example.myproject.OutputBufferFilter.doFilter(OutputBufferFilter.java:33)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:326)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
    at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:943)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
    at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
    at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: com.example.myproject.MyProjectServletException
    at com.example.myproject.MyServlet.doPost(MyServlet.java:169)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
    at com.example.myproject.OpenSessionInViewFilter.doFilter(OpenSessionInViewFilter.java:30)
    ... 27 more
Caused by: org.hibernate.exception.ConstraintViolationException: could not insert: [com.example.myproject.MyEntity]
    at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:96)
    at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:66)
    at org.hibernate.id.insert.AbstractSelectingDelegate.performInsert(AbstractSelectingDelegate.java:64)
    at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2329)
    at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2822)
    at org.hibernate.action.EntityIdentityInsertAction.execute(EntityIdentityInsertAction.java:71)
    at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:268)
    at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:321)
    at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:204)
    at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:130)
    at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:210)
    at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:56)
    at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:195)
    at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:50)
    at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:93)
    at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:705)
    at org.hibernate.impl.SessionImpl.save(SessionImpl.java:693)
    at org.hibernate.impl.SessionImpl.save(SessionImpl.java:689)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.hibernate.context.ThreadLocalSessionContext$TransactionProtectionWrapper.invoke(ThreadLocalSessionContext.java:344)
    at $Proxy19.save(Unknown Source)
    at com.example.myproject.MyEntityService.save(MyEntityService.java:59) <-- relevant call (see notes below)
    at com.example.myproject.MyServlet.doPost(MyServlet.java:164)
    ... 32 more
Caused by: java.sql.SQLException: Violation of unique constraint MY_ENTITY_UK_1: duplicate value(s) for column(s) MY_COLUMN in statement [...]
    at org.hsqldb.jdbc.Util.throwError(Unknown Source)
    at org.hsqldb.jdbc.jdbcPreparedStatement.executeUpdate(Unknown Source)
    at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeUpdate(NewProxyPreparedStatement.java:105)
    at org.hibernate.id.insert.AbstractSelectingDelegate.performInsert(AbstractSelectingDelegate.java:57)
    ... 54 more

steven-sheehy commented 5 years ago

Above, plus Grafana only shows them in descending order by time, so stacktrace is upside down from how it should appear. Also makes emoji art impossible to see, which a lot of java apps print on startup. :)

tlinhart commented 5 years ago

Python stack traces are the same:

[2019-08-13 06:58:20,588] ERROR in app: Exception on /graphql [POST]
Traceback (most recent call last):
  File "/srv/fzapi/venv/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/srv/fzapi/venv/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/srv/fzapi/venv/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/srv/fzapi/venv/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/srv/fzapi/venv/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/srv/fzapi/venv/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "./app/graphql/views.py", line 66, in service
    field_path_to_str(error.path, False),
AttributeError: 'Exception' object has no attribute 'path'

It's all about the context. Only complete log record holds the whole context, not any single line.

khauser commented 5 years ago

Could you please encounter the following scenario or at least give your opinion:

Because of Loki we have changed the log format of our appenders to use the logstash JSON-layout-formatter in our logback configuration. That's why now we have an all information containing log entry with also the whole stacktrace (containing the already noted \n\t). So no need to join lines afterwards. That sounds good to me...

Here is a sample entry:

{"@timestamp":"2019-08-13T23:57:20.971+00:00","@version":"1","message":"Error in job timer thread occurred!","logger_name":"somecode.core.internal.job.Timer","thread_name":"JobTimer759282143","level":"ERROR","level_value":40000,"stack_trace":"somecode.core.capi.common.SystemException: java.sql.SQLException: Exception occurred while getting connection: oracle.ucp.UniversalConnectionPoolException: Cannot get Connection from Datasource: java.sql.SQLException: Listener refused the connection with the following error:\nORA-12505, TNS:listener does not currently know of SID given in connect descriptor\n \n\tat somecode.core.internal.jdbc.JDBCDataSourceMgrImpl.getConnection(JDBCDataSourceMgrImpl.java:783)\n\tat somecode.core.internal.jdbc.JDBCDataSourceMgrImpl.getConnection(JDBCDataSourceMgrImpl.java:796)\n\tat somecode.core.internal.jdbc.JDBCUtilsImpl.getConnection(JDBCUtilsImpl.java:925)\n\tat somecode.core.capi.jdbc.JDBCUtils.getConnection(JDBCUtils.java:560)\n\tat somecode.core.internal.jdbc.DBDate.currentDBDate(DBDate.java:75)\n\tat somecode.core.internal.job.Timer.run(Timer.java:102)\nCaused by: java.sql.SQLException: Exception occurred while getting connection: oracle.ucp.UniversalConnectionPoolException: Cannot get Connection from Datasource: java.sql.SQLException: Listener refused the connection with the following error:\nORA-12505, TNS:listener does not currently know of SID given in connect descriptor\n \n\tat oracle.ucp.util.UCPErrorHandler.newSQLException(UCPErrorHandler.java:479)\n\tat oracle.ucp.util.UCPErrorHandler.throwSQLException(UCPErrorHandler.java:154)\n\tat oracle.ucp.jdbc.PoolDataSourceImpl.getConnection(PoolDataSourceImpl.java:1127)\n\tat oracle.ucp.jdbc.PoolDataSourceImpl.getConnection(PoolDataSourceImpl.java:1031)\n\tat oracle.ucp.jdbc.PoolDataSourceImpl.getConnection(PoolDataSourceImpl.java:1006)\n\tat somecode.core.internal.jdbc.JDBCDataSource.getConnection(JDBCDataSource.java:101)\n\tat somecode.core.internal.jdbc.JDBCDataSource.getConnection(JDBCDataSource.java:73)\n\tat somecode.core.internal.jdbc.JDBCDataSourceMgrImpl.getConnection(JDBCDataSourceMgrImpl.java:779)\n\t... 5 common frames omitted\nCaused by: oracle.ucp.UniversalConnectionPoolException: Cannot get Connection from Datasource: java.sql.SQLException: Listener refused the connection with the following error:\nORA-12505, TNS:listener does not currently know of SID given in connect descriptor\n \n\tat oracle.ucp.util.UCPErrorHandler.newUniversalConnectionPoolException(UCPErrorHandler.java:359)\n\tat oracle.ucp.util.UCPErrorHandler.throwUniversalConnectionPoolException(UCPErrorHandler.java:58)\n\tat oracle.ucp.jdbc.DataSourceConnectionFactoryAdapter.createConnection(DataSourceConnectionFactoryAdapter.java:102)\n\tat oracle.ucp.jdbc.oracle.OracleDataSourceConnectionFactoryAdapter.createConnection(OracleDataSourceConnectionFactoryAdapter.java:82)\n\tat oracle.ucp.common.Database.createPooledConnection(Database.java:268)\n\tat oracle.ucp.common.Cluster.create(Cluster.java:234)\n\tat oracle.ucp.common.Core.growBorrowed(Core.java:634)\n\tat oracle.ucp.common.UniversalConnectionPoolImpl.borrowConnectionWithoutCountingRequests(UniversalConnectionPoolImpl.java:230)\n\tat oracle.ucp.common.UniversalConnectionPoolImpl.borrowConnectionAndValidate(UniversalConnectionPoolImpl.java:139)\n\tat oracle.ucp.common.UniversalConnectionPoolImpl.borrowConnection(UniversalConnectionPoolImpl.java:120)\n\tat oracle.ucp.jdbc.JDBCConnectionPool.borrowConnection(JDBCConnectionPool.java:170)\n\tat oracle.ucp.jdbc.oracle.OracleJDBCConnectionPool.borrowConnection(OracleJDBCConnectionPool.java:849)\n\tat oracle.ucp.jdbc.oracle.OracleConnectionConnectionPool.borrowConnection(OracleConnectionConnectionPool.java:82)\n\tat oracle.ucp.jdbc.PoolDataSourceImpl.getConnection(PoolDataSourceImpl.java:1103)\n\t... 10 common frames omitted\nCaused by: java.sql.SQLException: Listener refused the connection with the following error:\nORA-12505, TNS:listener does not currently know of SID given in connect descriptor\n \n\tat oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:743)\n\tat oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:666)\n\tat oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:32)\n\tat oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:566)\n\tat oracle.jdbc.pool.OracleDataSource.getPhysicalConnection(OracleDataSource.java:317)\n\tat oracle.jdbc.pool.OracleDataSource.getConnection(OracleDataSource.java:241)\n\tat oracle.ucp.jdbc.DataSourceConnectionFactoryAdapter.createConnection(DataSourceConnectionFactoryAdapter.java:96)\n\t... 21 common frames omitted\nCaused by: oracle.net.ns.NetException: Listener refused the connection with the following error:\nORA-12505, TNS:listener does not currently know of SID given in connect descriptor\n \n\tat oracle.net.ns.NSProtocolStream.negotiateConnection(NSProtocolStream.java:275)\n\tat oracle.net.ns.NSProtocol.connect(NSProtocol.java:264)\n\tat oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:1452)\n\tat oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:496)\n\t... 27 common frames omitted\n","executionsite":"root"}

But... In loki and the file log itself the stacktrace isn't readable anymore. That's not good. At least for the loki side maybe something like a configuration possiblity (fluentd plugin plus promtail) for fields would be good (e.g. multiline_fields: [stack_trace]). Then in grafana each multiline field could be represented in a well formated column maybe (just a simple thought).

I don't know if the one-json-line approach is the best one but just wanted to note it here. In theory it might also be well displayed. For us this would be the most needed Loki+Grafana feature since I wouldn't let other engineers work with a one-line stacktrace..

On the other side we also need to handle logs from older environments where multilines in spite of stacktraces would play a role:

[2019-08-13 20:36:44 GMT] - [Thread-138270] INFO  c.i.b.w.w.DeleteExpiredFilesProcess: finished deletion of expired files. Total files removed: 0 pages from 0/4 sites (enabled/disabled for indexing)
[2019-08-13 21:06:50 GMT] - [Thread-138713] INFO  c.i.b.w.w.DeleteExpiredFilesProcess: finished deletion of expired files. Total files removed: 0 pages from 0/4 sites (enabled/disabled for indexing)
[2019-08-13 21:36:56 GMT] - [Thread-139154] INFO  c.i.b.w.w.DeleteExpiredFilesProcess: finished deletion of expired files. Total files removed: 0 pages from 0/4 sites (enabled/disabled for indexing)
[2019-08-13 22:00:12 GMT] - [main] ERROR c.i.b.w.w.WebAdapterAgent: cycle failed:
java.lang.NumberFormatException: For input string: "-db error"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.parseInt(Integer.java:615)
    at somecode.webinterface.webadapteragent.PageCache.expirePages(PageCache.java:292)
    at somecode.webinterface.webadapteragent.WebAdapterAgent.loop(WebAdapterAgent.java:144)
    at somecode.webinterface.webadapteragent.WebAdapterAgent.main(WebAdapterAgent.java:40)
[2019-08-13 22:00:13 GMT] - [main] INFO  c.i.b.w.w.WebAdapterAgent: All services are now up and running

PS: Also this issue https://github.com/grafana/loki/issues/616 also seems to be related with multilines.

khauser commented 5 years ago

We checked with Azure Log Analytics and with Kibana..

If a json formatted log entry stacktrace arrives in

Azure Log Analytics the table only shows the first characters of the stacktrace. But expanding the entry gives you a well formed one.
Kibana already displays the first 5 lines of the stacktrace and is also able to expand the log entry. Both outputs are well formed.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

pstibrany commented 5 years ago

Using timestamp stage in promtail with logs that contain lines without a timestamp (typically stack traces) results in a broken log stream.

If we use example from https://github.com/grafana/loki/issues/74#issuecomment-523942493

[2019-08-13 22:00:12 GMT] - [main] ERROR c.i.b.w.w.WebAdapterAgent: cycle failed:
java.lang.NumberFormatException: For input string: "-db error"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.parseInt(Integer.java:615)
    at somecode.webinterface.webadapteragent.PageCache.expirePages(PageCache.java:292)
    at somecode.webinterface.webadapteragent.WebAdapterAgent.loop(WebAdapterAgent.java:144)
    at somecode.webinterface.webadapteragent.WebAdapterAgent.main(WebAdapterAgent.java:40)
[2019-08-13 22:00:13 GMT] - [main] INFO  c.i.b.w.w.WebAdapterAgent: All services are now up and running

Then timestamp stage will extract correct timestamp from the first and last line, but other lines will be assigned 'current time' -- which will then cause subsequent lines to be out of order and rejected by Loki.

Now tracked in #1093

eNettMattD commented 5 years ago

Any news on this? We're facing the same issue with c# stacktraces. It's a deal-breaker for us.

llmgo commented 4 years ago

I have this problem too

cyriltovena commented 4 years ago

Do you all use promtail or fluentd/fluent-bit ? I think this feature is available in fluentd/fluent-bit would be great to test it, and see if we can mimic this behaviour.

eNettMattD commented 4 years ago

We're now using Fluentd/fluent-bit, via the Banzai Logging Operator. This now includes the detect exceptions plugin, which wraps multi-line stack traces. The benefit of the Banzai logging Operator is that it has the same label propagation as promtail. It is a major drawback of promtail that it doesn't have this feature

alexmayerlab commented 4 years ago

This is a useful feature, it will allow us to get rid of ES

kimminger commented 4 years ago

We are still using promtail and hope this feature will be on your roadmap :)

olivierboudet commented 4 years ago

Do you all use promtail or fluentd/fluent-bit ? I think this feature is available in fluentd/fluent-bit would be great to test it, and see if we can mimic this behaviour.

I am trying with fluent-bit but I can't get it to work correctly for Java applications and multiline stacktraces. When using a fluentbit regex parser, the multiline join is impossible (as stated in their documentation). If using the multiline configuration on the tail plugin, I need to set a parser for the first line, and for all different lines which can appears below ("at " lines, "Caused by" lines for a Java app). But if I have multiple application writing differents multiline format lines, I don't see how to configure it (I do not want to have a tail configuration different for each format as I can not write a valid path selector corresponding to the log format, and I prefere to use an annotation like fluentbit.io/parser to select the correct format). I tried also by logging json directly in my application. So line breaks are replaced by \n. It works, but the logs are not easily readable in grafana. \n patterns are not displayed as line breaks, and row in the table are displayed as pure json...

Examples : Rows displayed as json :

Row with a stacktrace in oneline, which is unreadable :

In my opinion, the lack of correct multiline support is the only remaining showstopper to replace elasticsearch, filebeat and kibana by loki, fluentbit/promtail and grafana.

flixr commented 4 years ago

I also wanted to change my applications (logging libs) to directly output json in order to handle multiline logs, etc.. But this of course means that grafana/loki should properly handle/display newline characters \n in the log messages...

tmikaeld commented 4 years ago

Any updates on this? It's a constant show-stopper.

cyriltovena commented 4 years ago

Yes there’s a PR, we’ll review it!

elvis-cai commented 4 years ago

same issue here, with flebeat bit and multiline, all multiple lines are combined to the same line, sorry, not readable for us.

ghost commented 4 years ago

When I was at Logentries we delimited on \r, \n, e.t.c but allowed loggers to concat multiple lines using the unicode line separator. The indexer would ignore it and treat the entire payload as one entry (it was then up to the UI to optionally reformat the code onto multiple lines). That's one version of an imperfect solution anyway :)

We are using the same solution on Splunk. Our applications are in Java and it's easy to configure loggers (Log4j2 and logback) to replace the newline characters by \u2028.

ligiacentea commented 4 years ago

After displaying log entries on single line (either by logging as JSON or using the the unicode line separator), I managed to render line endings in Grafana by setting a Promtail pipeline that replaces the line separator used in my log entries with new line character.
I used JSON format logs, and lines in a stacktrace are separated by escaped \r\n. I replaced escaped line ending characters with the following pipeline

pipeline_stages:
      - docker: {}
      - regex:
          expression: "^(?P<entry>.*)$"
      - template:
          source: entry
          template: '{{ Replace .Value "\\r" "\r" -1 }}'
      - template:
          source: entry
          template: '{{ Replace .Value "\\n" "\n" -1 }}'
      - template:
          source: entry
          template: '{{ Replace .Value "\\t" "\t" -1 }}'
      - output:
          source: entry

ahmed-lotfy commented 4 years ago

I have this problem too. and after lots of research i decided to develop a plugin for pushing logs with proper java stacktrace

https://github.com/ahmed-lotfy/loki-logback-appender

landmaj commented 4 years ago

I encountered the issue when pushing logs from python application to Loki using Loki Docker Logging Driver. Since @ligiacentea comment was very helpful, I will share my solution should anyone end up here when faced with a similar problem.

In Python, I use a custom formatter which replaces line feeds:

LINE_SEPARATOR = "\u000d\u000b"

class LokiFormatter(logging.Formatter):
    def format(self, record: logging.LogRecord):
        log = super().format(record)
        return log.replace("\r\n", LINE_SEPARATOR).replace("\n", LINE_SEPARATOR)

And in Loki Docker Driver I added a stage which changes back to line feeds:

pipeline_stages:
  - regex:
      expression: "^(?P<log>.*)$"
  - template:
      source: log
      template: '{{ Replace .Value "\u000d\u000b" "\u000a" -1 }}'
  - output:
      source: log

You may wonder why I use \u000d\u000b. It's a CARRIAGE RETURN followed by LINE TABULATION. While it's not exactly a new line, it looks exactly the same when seen in docker logs.

pgassmann commented 4 years ago

apt history logs are multiline entries separated by empty line. Is there a way to parse these as single entry?

/var/log/apt/history.log

Start-Date: 2020-05-15  14:46:48
Commandline: /usr/bin/apt-get -y -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confold install docker-ce
Install: containerd.io:amd64 (1.2.13-2, automatic), docker-ce:amd64 (5:19.03.8~3-0~ubuntu-bionic), docker-ce-cli:amd64 (5:19.03.8~3-0~ubuntu-bionic, automatic)
End-Date: 2020-05-15  14:47:04

Start-Date: 2020-05-16  06:06:29
Commandline: /usr/bin/unattended-upgrade
Upgrade: apt-transport-https:amd64 (1.6.12, 1.6.12ubuntu0.1)
End-Date: 2020-05-16  06:06:30

Start-Date: 2020-05-16  06:06:32
Commandline: /usr/bin/unattended-upgrade
Upgrade: iproute2:amd64 (4.15.0-2ubuntu1, 4.15.0-2ubuntu1.1)
End-Date: 2020-05-16  06:06:34

allout58 commented 4 years ago

@pgassmann the PR #1380 seems to cover that use case, so it seems like it's just a waiting game unfortunately

yacut commented 4 years ago

Any progress on this?

icyxp commented 4 years ago

any update on this?

pstibrany commented 4 years ago

No, not yet, as far as I know. I have it on my TODO list to finish #1380, but it's low priority at the moment. (If anyone wants to continue and finish, please go ahead)

yacut commented 4 years ago

@pstibrany I'm surprised and sad that the top 1 issue here has low priority 😞 Could you please explain why is that so?

d4c7 commented 4 years ago

hi guys: i was playing around with loki/promtail and i have run into the multiline issue too. I leave you my approach in case it could be of any use. I have tried the least possible impact over the project, otherwise the pipeline stage should be modified to support chaining or something similar:

multilinelog branch some doc

I will work on it to check performance impact, etc. in the next weeks. I'm concerned about the potential loss of logs if the postition is sync before all lines of multiline logs are processed. Only trying to collaborate with the community, don't shoot me :)

Very promising project btw

Fr33maan commented 4 years ago

@d4c7 Have you made a PR ? This is really needed while working with real world deployments and logging json

d4c7 commented 4 years ago

@Fr33maan no PR yet. Anyway may could you provide a log sample? I'm not sure my approach will work as you intended...

Fr33maan commented 4 years ago

@d4c7 sure. https://controlc.com/8883f610

If this is output, with the line breaks and tabs, it will be separated in different line entries while it should be a single entry.

d4c7 commented 4 years ago

A simplified version of your log is:

{
    "_id": "1",
    "name": {
      "first": "One",
    }
  },
  {
    "_id": "2",
    "name": {
      "first": "Two",
    },
  }

You could parse this using the "newline" mode of the proof of concept code at here

  Config{
      Mode:                "newline",
      Expression:          `^\s*\{\s*$`,
      Separator:           "\n",
    },

but you must be sure the json start line format is always the same. And you should remove the comma at the end of the json using Next regular expression. It's serendipity i think.

You will find problem with json lines like:

{
    "_id": "1",
    "name": 
    {   <------ problem here with previous handling mode
      "first": "One",
    }
}

Inspired by your use case a "json mode" was drafted to the multiline parser to better handle those cases. View doc

  Config{
      Mode:                "json",
      Separator:           "\n",
    },

It's no regular expression-dependant and the text between docs like commas is ignored.

Warning: do not use it in production please!! A lot of review is required

Fr33maan commented 4 years ago

@d4c7 will try it thanks !

dorsany commented 4 years ago

@Fr33maan is it worked for you? @d4c7 do you plan to open a PR any time soon?

Fr33maan commented 4 years ago

@dorsany not had time to try the solution but it's still on my roadmap

VectorWen commented 4 years ago

版本2.0已发布，解析多行日志（java堆栈错误）仍然不可用？

andrusstrockiy commented 4 years ago

loki2.0 came without multiline log support from promtail. So we should enjoy new features of loki by parsing one line logs only.

grafana / loki

Feature: multi-line logs #74