jasonacox / Powerwall-Dashboard

Grafana Monitoring Dashboard for Tesla Solar and Powerwall Systems
MIT License
270 stars 57 forks source link

Default influxdb.conf caused high CPU usage #407

Open youzer-name opened 6 months ago

youzer-name commented 6 months ago

The default influxdb.conf file contains this section:

[monitor]
  store-enabled = true
  store-database = "_internal"
  store-interval = "10s"

This enabled statistics logging to the _internal database which in turn results in much higher CPU usage. I had this turned off on my system, but it got reenabled when I ran upgrade.sh yesterday on my Raspberry Pi 4B / 4GB. The image below shows the CPU utilization on that machine where you can clearly see where I did the upgrade and where I set it back to store-enabled = false.

image

The InfluxDB Docs says having the statistics disabled will 'make it substantially more difficult to diagnose issues', (https://docs.influxdata.com/influxdb/v1/administration/config/#monitoring-settings) but has anyone involved in this project used the internal database to diagnose anything? If so, would it have been sufficient to turn on the logging once an issue was noticed?

Unless there is a compelling reason to keep the statistics logging turned on, I think it would make sense to have disabled by default.

mcbirse commented 6 months ago

Great find!

Given this I agree it makes sense that by default the statistics logging is turned off.

I will wait for some further opinions and see what @jasonacox thinks? It does make sense and generally seems statistics logging would only be needed to diagnose an observed fault, in which case it could be turned on in those circumstances.

has anyone involved in this project used the internal database to diagnose anything?

As it happens, yes, I did and you can see the details in the below post.

https://github.com/jasonacox/Powerwall-Dashboard/issues/12#issuecomment-1296394464

We were trying to investigate an issue and querying the internal database to see if CQ's were failing. In this case it did not help anyway, since the fault had occurred >1 week prior and the statistics only have a retention of 1 week.

The fact that the statistics logging increases CPU usage could have even been the cause of the issue in the first place now that I think about it! 😆

jasonacox commented 6 months ago

I agree. Disable by default, enable only when needing to troubleshoot a systemic issue.