sivasamyk / logtrail

Kibana plugin to view, search & live tail log events
MIT License
1.4k stars 185 forks source link

Data nodes CPU consumption when scrolling back upwards #344

Open sergeyarl opened 5 years ago

sergeyarl commented 5 years ago

Hi!

I'm still in the process of investigating all the details of the issue, but for now to put it short - when we scroll logs upwards (back in time) in Logtrail UI with empty query in search field (that matches all records), at the point when Logtrail starts showing a progress bar (while loading the less recent records), all data nodes in our cluster start consuming 100% cpu for 1-2 minutes.

Anything like this doesn't happen when we scroll downwards to load most recent records. Also in discover tab everything works well. ES works rather fast - records for the whole day can be obtained within 10-20 seconds period.

Our current configuration

logtrail.json:

{
  "version" : 2,
  "index_patterns" : [
    {
      "es": {
        "default_index": "fluentd-*"
      },
      "tail_interval_in_seconds": 10,
      "es_index_time_offset_in_seconds": 0,
      "display_timezone": "local",
      "display_timestamp_format": "MMM DD HH:mm:ss",
      "max_buckets": 500,
      "default_time_range_in_minutes" : 60,
      "max_hosts": 100,
      "max_events_to_keep_in_viewer": 5000,
      "default_search": "@log_name:nonexistent",
      "fields" : {
        "mapping" : {
            "timestamp" : "@timestamp",
            "display_timestamp" : "@timestamp",
            "hostname" : "container_id",
            "program": "@log_name",
            "message": "log"
        },
        "message_format": "{{{log}}}"
      },
      "color_mapping" : {
      }
    }
  ]
}

ES/Kibana 6.6.1 Java 11 logtrail@0.1.31

We have 2 daily indices of about 200 and 100 Gigs. Which in total makes about 3.5-4 TBs and distributed among 5 data nodes with SSD drives attached..

Thank you!

Regards, Sergey

sergeyarl commented 5 years ago

Also I enabled slow searches logging with

PUT /index_name/_settings { "index.search.slowlog.threshold.query.warn": "20s", "index.search.slowlog.threshold.fetch.warn": "20s" }

However I don't see any records there.

But when I execute long running queries in discover tab - I see them being logged in elasticsearch_index_search_slowlog.log. And none of them consume 100% CPU for more than couple of seconds in a row.