zensqlmonitor / influxdb-zabbix

Gather data from Zabbix back-end and load to InfluxDB in near real-time for enhanced performance and easier usage with Grafana.
MIT License
71 stars 26 forks source link

Memory usage and Garbage Collector #7

Closed menardorama closed 7 years ago

menardorama commented 7 years ago

Hi after fixing the long historical data issue (even if I would'nt have done like that) I am facing another issue.

The process consume all memory on the server while waiting for the result of the db.

It's like if the the GC is not working at all.

The result for me is that I can't transfert the data as it consume the whole 32 GB of RAM and OOM Killer kill the process.

zensqlmonitor commented 7 years ago

I made different loads during the day and the memory footprint of the process was less than 2GB for the 4 tables with 100k/per batch and 200k/per batch . It looks like the GC properly works. Again, please create the indexes to avoid to scan the complete table and think about the separation of concerns: the ETL process has to be in a separate server for avoiding resources contention.

zensqlmonitor commented 7 years ago

Here you are some stats for a run of 12 hours:

influxdb-zabbix process

postgresql backend

menardorama commented 7 years ago

Hi

Here is after 3 minutes : capture d ecran 2017-06-02 a 11 49 51

zensqlmonitor commented 7 years ago

what's your configuration ?

menardorama commented 7 years ago

Basically the server have the same specs

Latest version of influxdb Centos 7

zensqlmonitor commented 7 years ago

which GO version ? try to update with latest version. About your config file: input rows / batch ? Have you created the indexes ?

menardorama commented 7 years ago

I am using Go 1.7.4 and the index has not been created.

Regarding the config :
inputrowsperbatch=50000 outputrowsperbatch=50000 interval=60

But now regarding the indexes, it's just a good to have and should . not have any impact on the memory consumption on the client side.

The thing is I have a 500 GB zabbix DB (most of the data is for the history table and I don't want to add more indexing weight.

Having a limit on the row to return is a workaround but not the real solution (on a DBA part...) for me. A moving window based on the clock would be more light on the db side (as the ORDER BY force to get all the results in memory or worse in a temp file).

For a one year of historical the overload on the DB is just to much

zensqlmonitor commented 7 years ago

@menardorama a moving window based on number of days is now implemented

menardorama commented 7 years ago

Hi

Thanks for your feedback, it's much better now on the DB side.

But there is still something wrong, I think I pointed out but I am not enough good in Go to propose a patch.

I'll try to explain my observation.

From what I understand, you app works in two steps

My concern is that I have 57 millions of rows per week, and it does not fit in memory.

Another approach could be to process a batch of rows (at the fetch level) and insert them in influxdb.

This would be more scalable instead of waiting for the full fetch.

Another idea would be to spool the result to a tempfile at the fetch level and pass the filename to the influxdb processor.

Once again I'm sorry I am not good enough in Go to do it.

What do you think ?

zensqlmonitor commented 7 years ago

My concern is that I have 57 millions of rows per week, and it does not fit in memory.

You can now split the dataset to multiple dataset with the conf paramaters daysperbatch. For example, in the configuration file, you set:

startdate="2017-01-01T00:00:00" daysperbatch=15

=> process will start for data with timestamp between ]2017-01-01 and 2017-01-16[ and it will continue with the increment of 15 days, [2017-01-16 to 2017-01-31[, etc Like this and related to the number of rows you got per days, you can adjust the batch number.

Have you tested the last version ?

menardorama commented 7 years ago

Yes my comment was regarding the latest version.

And I can't put more RAM on my server (32GB already)

Setting daysperbatch=1 help a bit but it's already 57 Millions of rows and it consume all memory

zensqlmonitor commented 7 years ago

57 Millions for 1 day ? you just said it was for 1 week. Anyway, that's huge... I can't do better and sorry but won't spool the result in disk.

menardorama commented 7 years ago

Sorry you're right 51 millions is per week... But having à batch per day don't fit either in memory....

Thanks anyway.

Le 9 juin 2017 4:36 PM, "zensqlmonitor" notifications@github.com a écrit :

57 Millions for 1 day ? you just said it was for 1 week. Anyway, that's huge... Sorry but won't spool the result in disk.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zensqlmonitor/influxdb-zabbix/issues/7#issuecomment-307406244, or mute the thread https://github.com/notifications/unsubscribe-auth/AF0vYLQF0yTpF3sy4DVB4w3rzRT-14Zcks5sCVh0gaJpZM4NtKzr .

zensqlmonitor commented 7 years ago

Let's do it more granular. I've just commited a moving window based in hours -> new parameter: hours per batch. @menardorama could you please have a look ?