Performance problems on larger clusters

jasonish / evebox

Web Based Event Viewer (GUI) for Suricata EVE Events in Elastic Search

https://evebox.org/

MIT License

432 stars 67 forks source link

Performance problems on larger clusters #115

Closed myrinx closed 1 year ago

myrinx commented 5 years ago

Hi,

When on the elasticsearch cloud, hot/warm architecture, the performance of showing full flowid stream is very poor. It takes 50-60 seconds to have some data show up, but sometimes it just times out

We have roughly 600gb of data in the two nodes.

I believe that the reason is that the search is done cluster wide and not on the index that matches the time of the alert.

myrinx commented 5 years ago

I running the last dev version btw: 0.11.0dev (rev: e6c59f0) available on https://evebox.org/files/development/

jasonish commented 5 years ago

I'm not that familiar with Elastic Search deployment options. I had thought that if you send the query to one machine in the cluster, it would sort out the optimal search for you.

Is this not the case? Do you know how one would restructure the query so the search is not done cluster wide?

myrinx commented 5 years ago

So did I, but it runs that query all around.

The best idea it to specify the index like logstash-2019-08- to search for in the query instead of just logstash-

As it's a bad idea to make an index per day (a lot of shards is really really bad for performance) we now made an index per month. It's best to have sharslds that are 20-50 each, and with 5 shards per index by default, this is a lot of daily data :)

What I did on some custom programs is limiting the search to the logstash index of that month only. This does might have a limitation on events that occur at the first day of the month in the first hours, but we've seen this is very rarely a major roadblock.

jasonish commented 1 year ago

Closing as stale. Re-open if needed, but with datastreams and such this would be hard to do.