grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.96k stars 3.46k forks source link

Support for JSON log lines #137

Closed sandstrom closed 5 years ago

sandstrom commented 5 years ago

Loki looks very promising! šŸ†

Are there any plans to support ingestion of JSON log lines?

It seems to be a pretty common structure for logs these days. Here are some examples (can add more):

davkal commented 5 years ago

What sort of support are you looking for?

Loki is format-agnostic and ingests log lines as string lines, may they be access logs, logfmt key/value pairs, or JSON.

Grafana's Explore UI shows you the log lines, and if they are JSON, some support for in-browser parsing to plot distributions of values. Notice how the fields in the log line have an orange underline, that means they were parsed successfully:

screenshot 2018-12-18 at 11 15 46
r-moiseev commented 5 years ago

Does it have an ability to search against json fields?

draeron commented 5 years ago

it would be great if from example, i could filter my logs through a level field contained in the log

yubozhao commented 5 years ago

@r-moiseev You can use regex as part of the query to "search" against your json fields.

@draeron I don't think this feature will be available soon. In the mean time, you can add the "key" you want to filtered as part of your label that send to the Loki server. Or You could add additional parser for promtail that parse your logs in json format.

dano0b commented 5 years ago

Just to share an example, in GKE/Stackdriver I can just search for key/values in the jsonPayload: jsonPayload.@l="Warning" Would be awesome to have some direct support for structured logging.

4nte commented 5 years ago

Here's a reference on how Filebeat handles structured/json logs.

slim-bean commented 5 years ago

With the 0.1.0 release there is included a pipeline which includes a json stage that allows extraction of json log data to be used in labels and/or metrics using JMESPath expressions.

sandstrom commented 5 years ago

Great! šŸ„‡

minhdanh commented 5 years ago

I have a pod that emits logs in json format. But the logs are not being displayed as nested objects in Loki but a long string (the content of log field):

{"log":"{\"verb\":\"UPDATED\",\"event\":{\"metadata\":{\"name\":\"minio-backup.15c16e910ad17555\",\"namespace\":\"minio\",\"selfLink\":\"/api/v1/namespaces/minio/events/minio-backup.15c16e910ad17555\",\"uid\":\"75b2664e-d08a-11e9-aaf1-42010aa40066\",\"resourceVersion\":\"805300\",\"creationTimestamp\":\"2019-09-06T09:41:11Z\"},\"involvedObject\":{\"kind\":\"CronJob\",\"namespace\":\"minio\",\"name\":\"minio-backup\",\"uid\":\"a355d0fa-cf90-11e9-aaf1-42010aa40066\",\"apiVersion\":\"batch/v1beta1\",\"resourceVersion\":\"35722856\"},\"reason\":\"UnexpectedJob\",\"message\":\"Saw a job that the controller did not create or forgot: test-minio-backup\",\"source\":{\"component\":\"cronjob-controller\"},\"firstTimestamp\":\"2019-09-05T03:55:14Z\",\"lastTimestamp\":\"2019-09-06T10:46:18Z\",\"count\":1373,\"type\":\"Warning\"},\"old_event\":{\"metadata\":{\"name\":\"minio-backup.15c16e910ad17555\",\"namespace\":\"minio\",\"selfLink\":\"/api/v1/namespaces/minio/events/minio-backup.15c16e910ad17555\",\"uid\":\"75b2664e-d08a-11e9-aaf1-42010aa40066\",\"resourceVersion\":\"805295\",\"creationTimestamp\":\"2019-09-06T09:41:11Z\"},\"involvedObject\":{\"kind\":\"CronJob\",\"namespace\":\"minio\",\"name\":\"minio-backup\",\"uid\":\"a355d0fa-cf90-11e9-aaf1-42010aa40066\",\"apiVersion\":\"batch/v1beta1\",\"resourceVersion\":\"35722856\"},\"reason\":\"UnexpectedJob\",\"message\":\"Saw a job that the controller did not create or forgot: test-minio-backup\",\"source\":{\"component\":\"cronjob-controller\"},\"firstTimestamp\":\"2019-09-05T03:55:14Z\",\"lastTimestamp\":\"2019-09-06T10:41:14Z\",\"count\":1355,\"type\":\"Warning\"}}\n","stream":"stdout","time":"2019-09-06T10:46:18.681193448Z"}

Does Loki automatically handle json format or something else still missing?

Lucaber commented 5 years ago

Hi @minhdanh i just found a solution for eventrouter :)

    - match: 
        selector: '{app="eventrouter"}'
        stages:
          - json:
              expressions:
                log:
          - json:
              source: log
              expressions:
                event_verb: verb
                event:
          - json:
              source: event
              expressions:
               event_reason: reason
               involvedObject:
               source:
          - json:
              source: involvedObject
              expressions:
                event_kind: kind
                event_namespace: namespace
                event_name: name
          - json:
              source: source
              expressions:
                event_source_host: host
                event_source_component: component
          - labels:
              event_verb:
              event_kind:
              event_reason:
              event_namespace:
              event_name:
              event_source_host:
              event_source_component:
minhdanh commented 5 years ago

Hi @Lucaber Thanks for the solution. But looks like it doesn't work for me. I added your snippet to promtail's pipelineStages config: https://github.com/grafana/loki/blob/master/production/helm/promtail/values.yaml#L29

promtail:
  pipelineStages:
    - match:
        selector: '{app="eventrouter"}'
        stages:
          - json:
              expressions:
                log:
          - json:
              source: log
              expressions:
                event_verb: verb
                event:
          - json:
              source: event
              expressions:
                event_reason: reason
                involvedObject:
                source:
          - json:
              source: involvedObject
              expressions:
                event_kind: kind
                event_namespace: namespace
                event_name: name
          - json:
              source: source
              expressions:
                event_source_host: host
                event_source_component: component
          - labels:
              event_verb:
              event_kind:
              event_reason:
              event_namespace:
              event_name:
              event_source_host:
              event_source_component:

Then deployed promtail again. But the still the same in Loki.

slim-bean commented 5 years ago

@minhdanh loki only knows logs as byte arrays for storage, everything is basically a string.

Your log example looks like the output of a docker log line, which has json nested inside json.

I'm not quite sure what you are ultimately looking for in Grafana? But the simplest pipeline config would just include the docker stage which will unroll the docker json, and set the log json as the log line, which should then be un-esacaped and appear like normal json.

The config @Lucaber pasted is setting a series of labels from the log but is not manipulating the output sent to Loki, you must use an output pipeline stage for this (the docker stage internally is just a json, timestamp, label, and output stage)

Also @Lucaber I believe you could make your config a little more concise and probably a little faster:

promtail:
  pipelineStages:
    - match:
        selector: '{app="eventrouter"}'
        stages:
          - docker:
          - json:
              expressions:
                event_verb: verb
                event_kind: event.involvedObject.kind
                event_reason: event.reason
                event_namespace: event.involvedObject.namespace
                event_name: event.metadata.name
                event_source_host: event.source.host
                event_source_component: event.source.component
           - labels:
               event_verb:
               event_kind:
               event_reason:
               event_namespace:
               event_name:
               event_source_host:
               event_source_component:

If all your logs are docker, you could also move that outside the match:

promtail:
  pipelineStages:
    - docker:
    - match:
        selector: '{app="eventrouter"}'
        stages:
          - json:
...

The advantage of using the docker stage is that it will set the timestamp from the log line as well as set the output to the un-escaped json of the actual log message

Lucaber commented 5 years ago

Ohh yes, i was looking for loki labels to easily filter the logs. I previously tried something similar:

- match:
    selector: '{app="eventrouter"}'
    stages:
      - json:
          expressions:
            event_verb: log.verb
      - labels:
          event_verb:

I also tried verb instead of log.verb but my label was still empty (null). Maybe the docker stage does the trick, i will try this again later.

minhdanh commented 5 years ago

@slim-bean Thank you. Apparently I removed docker: {} in the pipeline stages and it didn't work. I added it again and it's working with correct json format in Grafana.

I'm not quite sure what you are ultimately looking for in Grafana?

With json supported by Loki I was expecting to search/query the logs using something like object.property=value. This is possible, right?

slim-bean commented 5 years ago

Currently no, neither grafana/logql have any higher level support for JSON, if you are using logcli you can use -o raw and pipe into something like jq to manipulate json directly. In grafana your option is currently to regex (but this will just match an entire log line).

There are plans to include better handling of JSON in the future but for now all logs are stored and treated the same.

mwennrich commented 5 years ago

@Lucaber

i just found a solution for eventrouter :)

    - match: 

Hi. Could you provide your full promtail.yaml (or helm values.yaml) for your eventrouter-promtail-loki solution? That would be great :-)

sirajshe commented 4 years ago

If we do not have nested JSON objects, can I expect this json log line

{"log":"database hrdb is not running\n","loglevel":"error","time":"2020-01-12T01:11:11.870000000-07.00"}

to be converted to the following format ?

  ts                                       output                            loglevel
  ===================================================================================
  2020-01-12T01:11:11.870000000-07.00      database hrdb is not running\n    error

I am using the following config and expecting it to parse the json log line.


- job_name: logjson
  static_configs:
  - targets:
      - localhost
    labels:
      job: jsonlogs
      __path__: /tmp/log.json
  pipeline_stages:
  - json:
      expressions:
        output: log
        loglevel: loglevel
        timestamp: time
  - labels:
      loglevel:
  - timestamp:
      source: time
      format: RFC3339Nano
DenisBiondic commented 4 years ago

This is kind of a deal breaker for us because:

alexvaut commented 4 years ago

I love the loki design but, same as @DenisBiondic, for us, without dynamic structured logging, which is what json would bring, it's tough for us to use loki.

We are using serilog in our C# stack to log lots of fields, not labels, just fields here and there. Using labels wouldn't work since they are plenty of fields with high cardinality. Each team is responsible of keeping in sync the field/log generation from our code and the queries - basically grafana dashboards with variables. Using regex only would be a huge step backwards the structured logging path we took (and are very happy with).

cyriltovena commented 4 years ago

@DenisBiondic check https://github.com/grafana/loki/pull/1848 and leave us some feedback like @alexvaut this will help our internal discussion with the team.

ghostsquad commented 4 years ago

I stumbled upon this issue, and I'm looking to introduce Loki to my team as well. As with @DenisBiondic we are running mostly structured logging, and I'm not looking forward to doing any sort of regex to find things, that seems like a step backwards.

Other resources I've found were: https://stackoverflow.com/questions/58564836/how-to-promtail-parse-json-to-label-and-timestamp https://github.com/grafana/loki/blob/master/docs/clients/promtail/pipelines.md?ts=4 https://grafana.com/blog/2019/07/25/lokis-path-to-ga-adding-structure-to-unstructured-logs/

All of which point to what feels like needing to know json fields ahead of time, and even converting a json log line back into a "structured text" line.

cyriltovena commented 4 years ago

We're working on solving this via LogQL. You'll be able to select which property at query time you want to show if not all (but that;s hard to read).

zrosenbauer commented 4 years ago

Any update here, we are attempting to roll this out company-wide but JSON logging seems to be a blocker.

cyriltovena commented 4 years ago

We've reviewing the final design doc. It's coming !

josephmilla commented 4 years ago

@cyriltovena any update?

vmrm commented 4 years ago

@cyriltovena why is it closed, any resolution here?

mirfilip commented 4 years ago

@slim-bean any chance to have this reopened? This issue is more about full json support (ingesting logs + querying logs with json support). Only the first part is done with json stage in pipeline and more and more people need this.

MOZGIII commented 4 years ago

Details: https://github.com/grafana/loki/pull/1848

cyriltovena commented 4 years ago

Yep Iā€™m working on the implementation ETA observabilityCON.

fseiftsasapp commented 4 years ago

any update on this ?

bkcsfi commented 4 years ago

Yes, there was a short demo yesterday in observabilityCON

Jump to the 36:15 offset in this video to see Loki 2.0 short overview including the new 'json' parser stage.

I think on Wed Oct 28th at 12 pm EST, there will be a deep dive into loki improvements. See this page

JessedeJonge commented 4 years ago

Really looking forward to the demo'd feature. Any ETA on release?

MOZGIII commented 4 years ago

It's in loki 2.0.0.

joeky888 commented 3 years ago

For those who end up here, the syntax is

{job="mysql"} | json | line_format "{{.message}}"

message is a json field.

alfianabdi commented 3 years ago

@joeky888 ? can we parse after that ?

I have shipped the log using fluentbit. The sample logs shipped by fluentbit is as follow

{
    "log": "2021-04-16 10:01:29.2037 [INFO] [00000000-0000-0000-0000-000000000000] The very long message",
    "stream": "stdout",
    "time": "2021-04-16T10:01:29.204350751Z",
    "kubernetes": {
        "pod_name": "mypodname",
        "pod_id": "mypodid",
        "host": "mynode",
        "container_name": "mycontainer",
        "docker_id": "mydockerid",
        "container_hash": "mycontainerhash",
        "container_image": "mycontainerimage"
    }
}

Can I parse the log field using regexp ? So that I can filter later by for example correlationId. I try the following

{namespace=mynamespace} | json log="log" | line_format "{{.log}}" | regexp "^(?P<time>\\S+\\s+\\S+)\\s+\\[(?P<logLevel>\\S+)\\]\\s+\\[(?P<correlationId>\\S*)\\]\\s+(?P<message>.*)$"

But that does not seem to work.

I am expecting the following labels.

time
logLevel
correlationId
message
joeky888 commented 3 years ago

Hello @alfianabdi your question is out of scope here, you should open a new issue. And your regex group looks wrong.

Also, I am NOT a maintainer here :)

alfianabdi commented 3 years ago

@joeky888 Thanks. I also found the mistake: the ^ and $. If removed it works.