tsaikd / gogstash

Logstash like, written in golang
MIT License
644 stars 106 forks source link

Peformance / efficieny numbers #124

Open kleijnweb opened 4 years ago

kleijnweb commented 4 years ago

Getting frustrated with logstash's memory hogging (and occasionally crashing on invalid JSON input), I set out create a simpler more efficient alternative. Then I found this, which seems to already be a way more complete alternative than what I had in mind. Couple of questions:

  1. Are there any real (-ish) world performance numbers available, specifically compared to Logstash?
  2. Is there any guidance for migrating from Logstash?
  3. Do you need some help with something? Specifically anything efficiency/performance related..
tsaikd commented 4 years ago
  1. I have no numbers about Logstash because I did not use it for a long time. I can provide a few numbers on my production servers with Gogstash: 13M documents (4GB) were recorded in 2019, Avg. CPU 0.5%, Avg. Mem <50MB.
  2. I think it's better to migrate your rules one by one to make sure everything is under control.
  3. You can provide your numbers of Logstash (current) and Gogstash (migrate) as references if possible. ^_^
kleijnweb commented 4 years ago

Ok then. When I have some time I'll start by migrating a single service to gogstash and see what happens.

tengattack commented 4 years ago

https://github.com/tsaikd/gogstash/issues/80 shows some useful rules migrating from filebeat, you can have a try :)

tengattack commented 4 years ago

I have use gogstash for a long time, but I haven't compare it directly to logstash. And also, I have build my own version by replacing the regexp library to a C binding library github.com/ungerik/gonigmo, which is a drop-in library you can use in Go. It will speed up user-agent matching significantly. What's more, I made a custom filter to remove sensitive data from the log, which is important in audit requirements.

BTW, it will cost more memory if you want less CPU usage (IP and useragent cache). I have run 4 child processes in total 4GB memory to achieve 5000qps (and it could be more) in our production system now.

kleijnweb commented 4 years ago

Ok, yeah not using the stdlib regex is required for something like this, they still haven't fixed the performance, that ticket has been open since 2015 or something. And I saw in the source that you are also using jsoniter so it sure looks that you made an effort to make it performant.

But whelp, that's some clunky YAML. I think I'll look into defining inputs and filters programmatically and build them into the binary, since I will be building docker images for it anyway. Maybe if I have time I'll create a PR to include support for plugins, if you're open to something like that