metrico / qryn

⭐️ All-in-One Polyglot Observability with OLAP Storage for Logs, Metrics, Traces & Profiles. Drop-in Grafana Cloud replacement compatible with Loki, Prometheus, Tempo, Pyroscope, Opentelemetry, Datadog and beyond :rocket:
https://qryn.dev
GNU Affero General Public License v3.0
1.24k stars 68 forks source link

Fail to store lncoming logs into DB #264

Closed MohammadrezaNasrabadi closed 1 year ago

MohammadrezaNasrabadi commented 1 year ago

In our log management stack, central rsyslog service receives logs from other agent rsyslog nodes, sends these gathered logs to promtail, promtail applies some labels (about 20 labels) on incoming logs, and sends them to qryn.

The push request from promtail could be seen in qryn container logs, but qryn doesn't store logs on DB, because the status code 204 was not shown per requestID.

I set NODE_OPTIONS environment variable to --max-old-space-size=4096 as mentioned in #263 , but the similar error log is still shown in qryn logs.

image

I have separated read requests from write requests.

akvlad commented 1 year ago

@MohammadrezaNasrabadi please try increasing the stack size as described here https://stackoverflow.com/questions/71237313/how-to-change-stack-size-limit-in-nodejs

MohammadrezaNasrabadi commented 1 year ago

It seems it is not supported to pass stack-size value inside NODE_OPTIONS variable when starting container.

NODE_OPTIONS="--max-old-space-size=4096 --stack-size=4096"

node: --stack-size= is not allowed in NODE_OPTIONS
node: --stack-size= is not allowed in NODE_OPTIONS
node: --stack-size= is not allowed in NODE_OPTIONS
node: --stack-size= is not allowed in NODE_OPTIONS
akvlad commented 1 year ago

Can you please provide a sample of the labels set? With all the specific data masked of course.

MohammadrezaNasrabadi commented 1 year ago

This is the configuration of promtail which sets labels on received syslog logs.

# SCRAPE CONFIGURATION
scrape_configs:
  - job_name: syslog

    syslog:
      listen_address: 0.0.0.0:514
      idle_timeout: 60s
      label_structured_data: no
      labels:
        job: "syslog"

    relabel_configs:
      - source_labels: ["__syslog_message_hostname"]
        target_label: "host"
      - source_labels: ["__syslog_connection_hostname"]
        target_label: "pod_hostname"
      - source_labels: ["__syslog_connection_ip_address"]
        target_label: "container_ip"
      - source_labels: ["__syslog_message_severity"]
        target_label: "level"
      - source_labels: ["__syslog_message_facility"]
        target_label: "facility"
      - source_labels: ["__syslog_message_app_name"]
        target_label: "appname"
      - source_labels: ["__syslog_message_proc_id"]
        target_label: "procid"
      - source_labels: ["__syslog_message_msg_id"]
        target_label: "msgid"

    # PARSER CONFIGURATION

    pipeline_stages:
      - json:
          expressions:
            timestamp: timestamp
            message: message
            access_level: access_level
            appname: appname
            uid: uid
            iid: iid
            payload: payload
            user_id: user_id
            event: event
            instance_id: instance_id
            component: component
            msg: msg
            body: body
            in: in
            level: level
            line: line
            pid: pid
      - json:
          expressions:
            user_ip: user_ip
          source: payload
      - labels:
          user_id:
          event:
          instance_id:
          component:
          msg:
          in:
          level:
          line:
          pid:
          id:
          state:
          timestamp:
          message:
          access_level:
          appname:
          uid:
          iid:
          user_ip:
          payload:
          body:

The version of qryn I'm using is 2.1.46 If I find out any valuable logs or data, will inform you.

MohammadrezaNasrabadi commented 1 year ago

I have tested qryn 2.1.2 and the POST requests are passed to DB successfully

lmangani commented 1 year ago

@MohammadrezaNasrabadi do you mean this does not working any version higher than 2.1.2?

MohammadrezaNasrabadi commented 1 year ago

@MohammadrezaNasrabadi do you mean this does not work any version higher than 2.1.2?

Actually, I didn't test versions higher than 2.1.2. I have chosen this version based on propose of my googling qryn docker hub.

MohammadrezaNasrabadi commented 1 year ago

@lmangani

This is the configuration of promtail which sets labels on received syslog logs.

# SCRAPE CONFIGURATION
scrape_configs:
  - job_name: syslog

    syslog:
      listen_address: 0.0.0.0:514
      idle_timeout: 60s
      label_structured_data: no
      labels:
        job: "syslog"

    relabel_configs:
      - source_labels: ["__syslog_message_hostname"]
        target_label: "host"
      - source_labels: ["__syslog_connection_hostname"]
        target_label: "pod_hostname"
      - source_labels: ["__syslog_connection_ip_address"]
        target_label: "container_ip"
      - source_labels: ["__syslog_message_severity"]
        target_label: "level"
      - source_labels: ["__syslog_message_facility"]
        target_label: "facility"
      - source_labels: ["__syslog_message_app_name"]
        target_label: "appname"
      - source_labels: ["__syslog_message_proc_id"]
        target_label: "procid"
      - source_labels: ["__syslog_message_msg_id"]
        target_label: "msgid"

    # PARSER CONFIGURATION

    pipeline_stages:
      - json:
          expressions:
            timestamp: timestamp
            message: message
            access_level: access_level
            appname: appname
            uid: uid
            iid: iid
            payload: payload
            user_id: user_id
            event: event
            instance_id: instance_id
            component: component
            msg: msg
            body: body
            in: in
            level: level
            line: line
            pid: pid
      - json:
          expressions:
            user_ip: user_ip
          source: payload
      - labels:
          user_id:
          event:
          instance_id:
          component:
          msg:
          in:
          level:
          line:
          pid:
          id:
          state:
          timestamp:
          message:
          access_level:
          appname:
          uid:
          iid:
          user_ip:
          payload:
          body:

The version of qryn I'm using is 2.1.46 If I find out any valuable logs or data, will inform you.

Are the number of labels I'm set reasonable based on our discussion on #255 ?

akvlad commented 1 year ago

The labels number you use is too large and the cardinality of label sets will become enormous very fast. I propose you to remove the most of them and leave only:

Please consider removing all other fields from the label set and store them in the log message only.

Of course, I'll do some research and optimize the store process, as well.

MohammadrezaNasrabadi commented 1 year ago

I understood we applied some useless and extra extensive labels (like timestamps or messages). Thanks for your notice.

But the interesting point is that the lower version was able to be functional under the huge amount of labels still.

The labels number you use is too large and the cardinality of label sets will become enormous very fast. I propose you to remove the most of them and leave only:

  • instance_id (if you have less than 100000 instances)
  • component
  • in
  • level
  • state
  • appname

Please consider removing all other fields from the label set and store them in the log message only.

Of course, I'll do some research and optimize the store process, as well.