howardjones / network-weathermap

Network Weathermap draws diagrams from data
http://www.network-weathermap.com/
MIT License
426 stars 94 forks source link

Weathermap plugin 1.0.0dev with 0 maps enabled causing master poller latency/timeouts #161

Closed ikorzha closed 5 years ago

ikorzha commented 5 years ago

Howie, please help I am absolutely desparate, I been fighting this issue for the last year and only recently realized that if I disable weathermap plugin, all polling problems and tholds checkup problems that I am experience right away dissapear. Attached is a screenshot when weathermap is installed and enabled, but 0 maps are enabled: Notice how weathermap is running twice on same minute and master server polling times 44 to 47sec I think this somehow "double weathermap run on same minute" is actually causing the problem but I have not idea what is the root cause: enabled

In this screenshot just 2minutes later, weathermap is still installed but disabled: Please notice how consistent is master server polling. 20-23 sec polling times no duplicate polling on same minute that I often see with weathermap plugin enabled.

image

image

As you suggested I checked "weathermap_data" table, it is not huge only 2600 items in it..

image

Please help Howie, you are my only hope to get this problem resolved once and for all.

howardjones commented 5 years ago

That's (at least) 2600 database queries per poll that you don't need to be running. Truncate the weathermap_data table and see how that affects it.

As you can see from your logs, the actual poller_bottom hook (the one that actually creates maps) takes 0.02 seconds, so this time must be somewhere else. The only other place is poller_output - where the 'Boost support' stuff happens. That works by looking up EVERY piece of data collected against the weathermap_data table. When TARGETs are removed from maps, the entries in weathermap_data are not cleaned up, so this can build up over time. Anything in weathermap_data is updated on each poll, on the assumption that weathermap still needs it.

If that does resolve it, then I guess we need to look at database indexes and maybe type=Memory for the weathermap_data table.

(the reason it doesn't clean up weathermap_data, is that would require another database update for each item to keep track of when it was last read - although I've never actually tried this to see what the performance hit would be)

netniV commented 5 years ago

The memory table would improve speed, as long as it doesn't need any historic data. The other alternative is that when you hit the boost update process, you spawn a background process to do this rather than hold up the poller. This is the way that several plugins of ours work because they'd run too long otherwise for a 1 minute period.

howardjones commented 5 years ago

But then the map will show old data... the reason I was thinking memory table is that Boost is also using one, so in the event of a mysql restart you would have no data anyway.

netniV commented 5 years ago

Yeah, memory table does seem to be the way to go when you put it that way.

ikorzha commented 5 years ago

I have followed your recommendation: I have disabled all of the maps and truncated the table: image

Please see visualized screenshot I have created. As you can see. YES 'weathermap_data' data is a root cause of the problem. After I have truncated it and re-enabled only 10 maps poller already took 17sec hit and I haven't even enabled all of the maps.

Howie please help to resolve this bottleneck going forward in new release this is a critical scalability problem that I can't get around...The worst part it is victimizing Cacti poller, every few min I get msg that poller table not empty, and thold daemon report processing errors every few min.. image

Info-graph Results: image

howardjones commented 5 years ago

I see your point, but you did also choose to make your life 5 times harder.

So from the above discussion, one thing to try would be:

alter table weathermap_data engine = memory

I'd be interested to know if that makes any difference... after that, I guess it would be indexes.

netniV commented 5 years ago

People are eager ;-)

ikorzha commented 5 years ago

Sorry guys, was getting into office it 8am after all on US east coast :) I have to say this is truly a miracle: image

I now have 10 maps enabled and running with 0 sec master poller performance impact. Howie, please make memory table change permanent in upcoming release. It is a life saver!!! image image

ikorzha commented 5 years ago

Howie, I did see that you updated the new code with memory table I am going to close this issue..