metrico / qryn

⭐️ All-in-One Polyglot Observability with OLAP Storage for Logs, Metrics, Traces & Profiles. Drop-in Grafana Cloud replacement compatible with Loki, Prometheus, Tempo, Pyroscope, Opentelemetry, Datadog and beyond :rocket:
https://qryn.dev
GNU Affero General Public License v3.0
1.24k stars 68 forks source link

slow calculation with json parser #359

Closed arnitolog closed 11 months ago

arnitolog commented 1 year ago

Hello, I see that JSON parser heavily loads CPU on qryn instance. I'm trying to calculate count_over_time for status codes for the ingress controller. here is what the query looks like: sum by(status) (count_over_time({ProductComponents="ingress-nginx"} | json [10m])) It seems that the query is dropped due to timeout 30s, so it is not able to return any results: image Here is how qryn CPU usage looks like during this execution: image here is an example of the log message:

{
  "@timestamp": "2023-10-17T23:34:33.000Z",
  "Product": "DevOps",
  "ProductComponents": "ingress-nginx",
  "body_bytes_sent": "425",
  "bytes_sent": "660",
  "cloud": {
    "machine": {
      "type": "Standard_D4ads_v5"
    },
    "provider": "azure",
    "region": "eastus",
    "service": {
      "name": "Virtual Machines"
    }
  },
  "http_referrer": "",
  "kubernetes": {
    "container": {
      "name": "controller"
    },
    "namespace": "kube-system",
    "pod": {
      "name": "internal-dev-ingress-nginx-controller-6b7cf98d86-k2w2t"
    }
  },
  "level": "INFO",
  "remote_addr": "127.0.0.1",
  "request_host": "os-cluster-eastus.aks.demo.com",
  "request_id": "2b8f17d66eca7204599f64fc3b270667",
  "request_length": "472",
  "request_method": "GET",
  "request_protocol": "HTTP/1.1",
  "request_query": "GET /_cluster/health HTTP/1.1",
  "request_time": "0.006",
  "request_uri": "/_cluster/health",
  "status": "200",
  "time": "2023-10-17T23:34:33+00:00",
  "upstream_addr": "10.61.89.198:9200",
  "upstream_name": "os-cluster-9200",
  "upstream_response_length": "425",
  "upstream_response_time": "0.006",
  "upstream_status": "200",
  "x-forward-for": "127.0.0.1"
} 

with available labels: image

Am I doing it wrong? Or are there any ways to improve the performance?

akvlad commented 1 year ago

@arnitolog please try using | json status="status" instead of just | json

akvlad commented 1 year ago

| Json without parameters is the slowest parser in the stack. Please try avoiding it. Currently we're thinking about how to replace it. Any help is appreciated.

akvlad commented 1 year ago

@arnitolog for your case something like

| regexp `"status": ?"(?P<status>[0-9]+)"`

can go even faster than | json status... Please always prefer | regexp over json a=..., over | json .

arnitolog commented 1 year ago

got it. thanks @akvlad. But I'm not sure that it will be really easy and helpful to use regexp during the incidents and troubleshooting by developers. Anyway, thanks for the answer.