plugwise / plugwise_usb-beta

Custom-component / development version of the USB Plugwise Home Assistant platform
MIT License
4 stars 4 forks source link

[BUG] Problem using Plugwise USB Stick: Rpi hangs due to running out of memory #10

Closed MagnusThome closed 8 months ago

MagnusThome commented 1 year ago

Describe the bug.

Plugwise might be totally innocent but here's my scenario:

Two fully separate HA installations in different houses. Both are Rpi4 with new original Rpi OEM PSUs. Both with Plugwise USB Stick plugged in. Both regularly crash and need power cycling to get back up again.

In addition to the Plugwise Beta plus Sticks both also have Tasmota installed. Other than that they have a few things installed but not the same on both.

Just curious if the Stick can be part of the problem if anyone else have had issues? And if so if Plugwise beta is something to dig out logs from and so on.

What version of Home Assistant Core has the issue?

All (both Rpis set up from scratch this spring)

What was the last working version of Home Assistant Core?

None

What type of installation are you running?

Home Assistant OS

How did you install plugwise-beta?

HACS

What kind of Plugwise device is having issues?

USB: Stick

What firmware version is your Plugwise product at?

Circles are firmware: 2015-09-18 10:53:15

Logging

No response

Diagnostics information

No response

Additional information

No response

bouwew commented 1 year ago

Do you use SD-cards as the storage-medium on these Pi's? If yes, that's most likely the reason for the crashes. Your SD-card will become full in no time if you don't take special measures in limiting the growth of the HA database.

MagnusThome commented 1 year ago

Thanks for the reply. Both use same brand Sandisk USB sticks to run on. One has 32GB and is just now reporting 20GB free, the other one runs on 16GB and reports 5,7GB free.

bouwew commented 1 year ago

Ok, so you're saying, that should not be the cause of your problem?

Next question: is there any operation-system log that you can check for clues? Like the journal-log? This also depends on how you have installed Home Assistant?

Also, what is crashing? The Pi operating system or HA? If HA is crashing, is the operating system still running?

MagnusThome commented 1 year ago

It doesn't look like the USB sticks should be a problem I think?

I've been looking at the different available logs but honestly don't know what to search for. I do see some errors from the plugwise addon but not sure what is okay noise and what is an actual problem.

On one of the installs I see this reoccuring:

2022-11-02 13:02:05.026 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F000037B255 because max retries 3 reached
2022-11-02 13:02:05.173 WARNING (serial_reader_thread) [plugwise.controller] Drop CirclePowerUsageRequest to 000D6F000037B255 because max retries 3 reached

On the other install I see many of these too:

2022-11-02 13:41:59.901 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F000038C184 because max retries 3 reached
2022-11-02 13:42:14.697 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768CAD because max retries 3 reached
2022-11-02 13:42:16.500 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F00004BF6F1 because max retries 3 reached
2022-11-02 13:42:17.676 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 13:42:25.295 WARNING (receive_timeout_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 13:43:04.013 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 13:43:32.335 WARNING (serial_reader_thread) [plugwise.controller] Drop CirclePowerUsageRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 13:43:57.782 WARNING (serial_reader_thread) [plugwise.controller] Drop CirclePowerUsageRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 13:43:59.583 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached

When they have "crashed" and I ping I seem to get replies and port 22 and 8123 are also open according to my port scanner, but trying with ssh or web browser there is nothing coming back from either.

bouwew commented 1 year ago

The messages you show are indeed okay noise :) I see many of them too in my system.

Ok, so your Pi OS crashes? With which method have you installed HA on your Pi?

Also, have you searched for "Pi crashes" or similar in https://community.home-assistant.io/?

MagnusThome commented 1 year ago

I also found two of these watchdog and timeout events but only on one of the Pis, not the other one

2022-11-02 14:15:28.055 ERROR (receive_timeout_thread) [root] Uncaught thread exception

2022-11-02 14:15:46.436 WARNING (watchdog_thread) [plugwise.controller] Unexpected halt of receive thread, restart thread

But it continued to run at these two occasions though, no crash

2022-11-02 14:13:17.392 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 14:13:27.878 WARNING (receive_timeout_thread) [plugwise.controller] Drop NodePingRequest to 000D6F000038C184 because max retries 3 reached
2022-11-02 14:13:27.887 WARNING (receive_timeout_thread) [plugwise.controller] Drop CirclePowerUsageRequest to 000D6F0000768CAD because max retries 3 reached
2022-11-02 14:13:30.904 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768CAD because max retries 3 reached
2022-11-02 14:13:33.955 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 14:13:35.716 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F00004BF6F1 because max retries 3 reached
2022-11-02 14:13:49.010 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F00004BF6F1 because max retries 3 reached
2022-11-02 14:13:49.157 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F00004BF6F1 because max retries 3 reached
2022-11-02 14:14:02.240 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 14:14:19.867 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F00004BF6F1 because max retries 3 reached
2022-11-02 14:14:21.099 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 14:14:21.701 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F000038C184 because max retries 3 reached
2022-11-02 14:14:33.192 WARNING (MainThread) [homeassistant.helpers.template_entity] Template loop detected while processing event: <Event state_changed[L]: entity_id=sensor.all_effekt, old_state=<state sensor.all_effekt=267.65; unit_of_measurement=W, device_class=power, friendly_name=All effekt @ 2022-11-02T14:14:31.184424+01:00>, new_state=<state sensor.all_effekt=265.45; unit_of_measurement=W, device_class=power, friendly_name=All effekt @ 2022-11-02T14:14:32.192254+01:00>>, skipping template render for Template[{% set ns = namespace(states=[]) %} {% for s in states.sensor %}
{% if s.object_id.startswith('power_usage_') %}
{% set ns.states = ns.states + [ s.state | float ] %}
{% endif %}
{% endfor %} {{ ns.states | sum | round(2) }}]
2022-11-02 14:14:50.474 WARNING (serial_reader_thread) [plugwise.controller] Drop CirclePowerUsageRequest to 000D6F0000768CAD because max retries 3 reached
2022-11-02 14:14:50.632 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F00004BF6F1 because max retries 3 reached
2022-11-02 14:14:50.769 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F000038C184 because max retries 3 reached
2022-11-02 14:14:50.919 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 14:14:52.157 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768CAD because max retries 3 reached
2022-11-02 14:15:14.514 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F00004BF6F1 because max retries 3 reached
2022-11-02 14:15:22.295 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F000038C184 because max retries 3 reached
2022-11-02 14:15:22.424 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 14:15:22.667 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768CAD because max retries 3 reached
2022-11-02 14:15:28.042 WARNING (receive_timeout_thread) [plugwise.controller] Drop CirclePowerUsageRequest to 000D6F000038C184 because max retries 3 reached
2022-11-02 14:15:28.055 ERROR (receive_timeout_thread) [root] Uncaught thread exception
Traceback (most recent call last):
File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/site-packages/plugwise/controller.py", line 329, in _receive_timeout_loop
if self.expected_responses[seq_id][3] is not None:
KeyError: b'DE3A'
2022-11-02 14:15:42.937 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F000038C184 because max retries 3 reached
2022-11-02 14:15:46.436 WARNING (watchdog_thread) [plugwise.controller] Unexpected halt of receive thread, restart thread
2022-11-02 14:15:47.465 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 14:15:47.613 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F000038C184 because max retries 3 reached
2022-11-02 14:15:56.837 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F00004BF6F1 because max retries 3 reached
2022-11-02 14:16:02.835 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F00004BF6F1 because max retries 3 reached
2022-11-02 14:16:10.827 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 14:16:10.987 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768CAD because max retries 3 reached
2022-11-02 14:16:16.486 WARNING (receive_timeout_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768CAD because max retries 3 reached
2022-11-02 14:16:31.642 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768CAD because max retries 3 reached
2022-11-02 14:16:32.699 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F000038C184 because max retries 3 reached
2022-11-02 14:16:32.881 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 14:16:33.160 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F000038C184 because max retries 3 reached
2022-11-02 14:16:46.528 WARNING (receive_timeout_thread) [plugwise.controller] Drop CircleEnergyCountersRequest to 000D6F000356273F because max retries 3 reached
2022-11-02 14:16:46.529 WARNING (receive_timeout_thread) [plugwise.controller] Drop CircleEnergyCountersRequest to 000D6F000356273F because max retries 3 reached
2022-11-02 14:16:46.532 WARNING (receive_timeout_thread) [plugwise.controller] Drop CircleEnergyCountersRequest to 000D6F000356273F because max retries 3 reached
2022-11-02 14:38:43.390 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 14:38:43.545 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768CAD because max retries 3 reached
2022-11-02 14:38:48.695 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F000038C184 because max retries 3 reached
2022-11-02 14:38:48.915 WARNING (receive_timeout_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 14:38:48.917 WARNING (receive_timeout_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768CAD because max retries 3 reached
2022-11-02 14:38:48.920 WARNING (receive_timeout_thread) [plugwise.controller] Drop CircleEnergyCountersRequest to 000D6F0003562646 because max retries 3 reached
2022-11-02 14:38:48.923 WARNING (receive_timeout_thread) [plugwise.controller] Drop CircleEnergyCountersRequest to 000D6F00035626FE because max retries 3 reached
2022-11-02 14:38:48.925 WARNING (receive_timeout_thread) [plugwise.controller] Drop CircleEnergyCountersRequest to 000D6F00035626FE because max retries 3 reached
2022-11-02 14:39:02.373 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F000038C184 because max retries 3 reached
2022-11-02 14:39:03.963 WARNING (receive_timeout_thread) [plugwise.controller] Drop NodePingRequest to 000D6F00004BF6F1 because max retries 3 reached
2022-11-02 14:39:03.971 ERROR (receive_timeout_thread) [root] Uncaught thread exception
Traceback (most recent call last):
File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/site-packages/plugwise/controller.py", line 329, in _receive_timeout_loop
if self.expected_responses[seq_id][3] is not None:
KeyError: b'02EF'
2022-11-02 14:39:06.904 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 14:39:07.042 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768CAD because max retries 3 reached
2022-11-02 14:39:31.037 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F00004BF6F1 because max retries 3 reached
2022-11-02 14:39:31.174 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F000038C184 because max retries 3 reached
2022-11-02 14:39:31.333 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 14:39:31.471 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768CAD because max retries 3 reached
2022-11-02 14:39:48.063 WARNING (watchdog_thread) [plugwise.controller] Unexpected halt of receive thread, restart thread
2022-11-02 14:39:50.175 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F000038C184 because max retries 3 reached
2022-11-02 14:39:59.336 WARNING (MainThread) [homeassistant.helpers.template_entity] Template loop detected while processing event: <Event state_changed[L]: entity_id=sensor.all_effekt, old_state=<state sensor.all_effekt=293.34; unit_of_measurement=W, device_class=power, friendly_name=All effekt @ 2022-11-02T14:39:57.312549+01:00>, new_state=<state sensor.all_effekt=291.19; unit_of_measurement=W, device_class=power, friendly_name=All effekt @ 2022-11-02T14:39:58.326518+01:00>>, skipping template render for Template[{% set ns = namespace(states=[]) %} {% for s in states.sensor %}
{% if s.object_id.startswith('power_usage_') %}
{% set ns.states = ns.states + [ s.state | float ] %}
{% endif %}
{% endfor %} {{ ns.states | sum | round(2) }}]
2022-11-02 14:40:21.082 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F00004BF6F1 because max retries 3 reached
2022-11-02 14:40:21.684 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F000038C184 because max retries 3 reached
2022-11-02 14:40:23.334 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 14:40:23.492 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768CAD because max retries 3 reached
2022-11-02 14:40:33.137 WARNING (receive_timeout_thread) [plugwise.controller] Drop NodePingRequest to 000D6F00004BF6F1 because max retries 3 reached
2022-11-02 14:40:33.143 WARNING (receive_timeout_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768C91 because max retries 3 reached
2022-11-02 14:40:33.146 WARNING (receive_timeout_thread) [plugwise.controller] Drop NodePingRequest to 000D6F0000768CAD because max retries 3 reached
2022-11-02 14:40:45.802 WARNING (serial_reader_thread) [plugwise.controller] Drop NodePingRequest to 000D6F00004BF6F1 because max retries 3 reached
MagnusThome commented 1 year ago

Ok, so your Pi OS crashes?

Sort of half crashes I guess since it still has the ports open but they are sort of dead anyway.

With which method have you installed HA on your Pi?

Used the downloadable image for the Pi and made the USB sticks with it

Also, have you searched for "Pi crashes" or similar in https://community.home-assistant.io/?

Ah, will do!

bouwew commented 1 year ago

Very similar to you case? https://community.home-assistant.io/t/unexpected-crashes-how-to-find-the-cause/361822

MagnusThome commented 1 year ago

Very similar to you case? https://community.home-assistant.io/t/unexpected-crashes-how-to-find-the-cause/361822

Ok, worth a try to add a fan. Will give it a go :-)

MagnusThome commented 1 year ago

I found HA isn't crashing, its under super heavy load, these are the two HA I run and what it looks like when they don't reply at all and I have to power cycle them. Is there any way I can check if these loads are from the plugwise addon or if its something else?

image

image

bouwew commented 1 year ago

There is clearly a memory-leak. You can disable the plugwise (Stick) integration and then monitor your system again.

I must say, I use the Stick integration myself, next to an Adam and P1 integration, I never have any problems with the memory getting full or similar. I'm running HA in a virtual machine under Proxmox.

MagnusThome commented 1 year ago

So far no memory peaks like that after disabling. But I'll let it run a couple of days more.

MagnusThome commented 1 year ago

Interesting, it's way more responsive and fast in the web interface also. It's a Rpi4 .

MagnusThome commented 1 year ago

Sorry to say but HA runs flawlessly with Plugwise beta disabled.

Something with using an USB stick instead of the other options I guess then 😟

Or could it be something with old data stats still in the circles? They have all been reset though.

bouwew commented 1 year ago

@brefra any idea about the above data & conclusion?

MagnusThome commented 1 year ago

Some logs with/without plugwise beta running with USB stick

Screenshot 2022-12-01 164717

MagnusThome commented 1 year ago

What can I do to help pinpoint the issue?

herwinjan commented 1 year ago

Hi,

There is a memory leak.. but where, if i enable the plugwise integration, my systemen (i5 NUC ESXi with 8 GB memory assigned) if full in 1 to 2 days.

CleanShot 2023-01-13 at 08 25 39@2x

See the graph with Plugwise enabled and disabled.

My system:

MagnusThome commented 1 year ago

I know you've spent time on this and I'm very grateful. Is there anything I can do to help?

I noticed this odd thing with both a reoccuring same growing rate of memory usage but also a suddenly much faster rise in memory usage. Surprised me a bit

image

MagnusThome commented 1 year ago

Very different behaviour lately with latest updates Screenshot_20230405_222052

MagnusThome commented 1 year ago

@brefra @bouwew

An update of my logging :-)

I've marked with red arrows where homeassistant totally stopped responding and I had to power cycle it to get it going again.

Then I disabled plugwise beta and that is where the spikes totally stop coming:

Untitled-1

MagnusThome commented 1 year ago

Screenshot_20230414_154126_com android chrome

MagnusThome commented 1 year ago

Screenshot_20230418_195800_com android chrome

dirixmjm commented 8 months ago

@MagnusThome can you please try with release 0.31.1b1? see https://github.com/plugwise/plugwise_usb-beta/discussions/110

MagnusThome commented 8 months ago

@MagnusThome can you please try with release 0.31.1b1? see #110

Sorry can't, I've sold off all Plugwise gear.

dirixmjm commented 8 months ago

@MagnusThome ok I understand. @CoMPaTech can you please close this one.