jens-maus / RaspberryMatic

:house: A feature-rich but lightweight, buildroot-based Linux operating system alternative for your CloudFree CCU3/ELV-Charly 'homematicIP CCU' IoT smarthome central. Running as a pure virtual appliance (ProxmoxVE, Home Assistant, LXC, Docker/OCI, Kubernetes/K8s, etc.) on a dedicated embedded device (RaspberryPi, etc.) or generic x86/ARM hardware.
https://raspberrymatic.de
Apache License 2.0
1.55k stars 192 forks source link

Not enough space in buffers. #2074

Closed Jacuo closed 1 year ago

Jacuo commented 2 years ago

Describe the issue you are experiencing

HMIPServer cant be restarting.
UI is not responding Device ( Charly ) resatart helps for some time

Describe the behavior you expected

No errors

Steps to reproduce the issue

  1. Run rabermatic
  2. Wait some time
  3. ...

What is the version this bug report is based on?

3.65.8.20220831

Which base platform are you running?

rpi3 (RaspberryPi3)

Which HomeMatic/homematicIP radio module are you using?

RPI-RF-MOD

Anything in the logs that might be useful for us?

messages
Nov 26 16:53:19 homematic-raspi user.info monit[1887]: 'HMIPServer' restart: '/etc/init.d/S62HMServer restart'
Nov 26 16:53:35 homematic-raspi kern.info kernel: [2788101.566702] eq3loop: eq3loop_write_master() return error: -14
Nov 26 16:53:35 homematic-raspi kern.err kernel: [2788126.254604] eq3loop: eq3loop_write_master() mmd_hmip: not enough space in buffers. free space = 5, required space = 34
Nov 26 16:53:49 homematic-raspi user.err monit[1887]: 'HMIPServer' failed to restart (exit status -1) -- '/etc/init.d/S62HMServer restart': Program timed out -- Stopping HMIPServer: OK Starting HMIPServer: ...............
Nov 26 16:54:07 homematic-raspi user.err monit[1887]: 'HMIPServer' service restarted 1 times within 1 cycles(s) - exec

hmserver.log

022-11-26 16:58:06,076 de.eq3.cbcs.server.core.vertx.KeyServerWorker ERROR [vert.x-worker-thread-3] Missing key server configuration parameter (Network.Key) for  mode: KEYSERVER_LOCAL 
2022-11-26 16:58:10,057 de.eq3.cbcs.server.local.base.internal.LocalServerAdapterInitialization ERROR [vert.x-eventloop-thread-4] Error 'LOCAL_ADAPTER_NO_SUCH_PORT' while trying to open port '/dev/mmd_hmip': 
de.eq3.cbcs.lib.commdevice.CommDeviceException: Exception while trying to open serial port. Check configured port '/dev/mmd_hmip'
        at de.eq3.cbcs.lib.nrjavaserialdevice.SerialCommDevice.open(SerialCommDevice.java:103) ~[HMIPServer.jar:?]
        at de.eq3.cbcs.lib.hmiptrxcommadapter.HomeMaticIPTRXCommAdapter.open(HomeMaticIPTRXCommAdapter.java:894) ~[HMIPServer.jar:?]
        at de.eq3.cbcs.lib.hmiptrxcommadapter.HomeMaticIPTRXCommAdapter.<init>(HomeMaticIPTRXCommAdapter.java:103) ~[HMIPServer.jar:?]
        at de.eq3.cbcs.server.local.base.internal.ShareableHomeMaticIPTRXCommAdapter.<init>(ShareableHomeMaticIPTRXCommAdapter.java:34) ~[HMIPServer.jar:?]
        at de.eq3.cbcs.server.local.base.internal.LocalServerAdapterInitialization.start(LocalServerAdapterInitialization.java:109) ~[HMIPServer.jar:?]
        at io.vertx.core.AbstractVerticle.start(AbstractVerticle.java:106) ~[HMIPServer.jar:?]
        at io.vertx.core.impl.DeploymentManager.lambda$doDeploy$5(DeploymentManager.java:196) ~[HMIPServer.jar:?]
        at io.vertx.core.impl.AbstractContext.dispatch(AbstractContext.java:100) ~[HMIPServer.jar:?]
        at io.vertx.core.impl.AbstractContext.dispatch(AbstractContext.java:63) ~[HMIPServer.jar:?]
        at io.vertx.core.impl.EventLoopContext.lambda$runOnContext$0(EventLoopContext.java:38) ~[HMIPServer.jar:?]
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[HMIPServer.jar:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) ~[HMIPServer.jar:?]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) ~[HMIPServer.jar:?]
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) ~[HMIPServer.jar:?]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[HMIPServer.jar:?]
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[HMIPServer.jar:?]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_345]
Caused by: gnu.io.NoSuchPortException
        at gnu.io.CommPortIdentifier.getPortIdentifier(CommPortIdentifier.java:274) ~[HMIPServer.jar:?]
        at de.eq3.cbcs.lib.nrjavaserialdevice.SerialCommDevice.open(SerialCommDevice.java:95) ~[HMIPServer.jar:?]
        ... 16 more

lighttpd-error.log

2022-11-26 16:57:46: (../src/gw_backend.c.360) gw-server re-enabled: tcp:127.0.0.1:32010 127.0.0.1 32010 
2022-11-26 16:58:28: (../src/gw_backend.c.283) establishing connection failed: socket: tcp:127.0.0.1:32010: Connection refused
2022-11-26 16:58:28: (../src/gw_backend.c.993) all handlers for /? on  are down.
2022-11-26 16:58:31: (../src/gw_backend.c.360) gw-server re-enabled: tcp:127.0.0.1:32010 127.0.0.1 32010 
2022-11-26 16:59:14: (../src/gw_backend.c.283) establishing connection failed: socket: tcp:127.0.0.1:32010: Connection refused
2022-11-26 16:59:14: (../src/gw_backend.c.993) all handlers for /? on  are down.

Additional information

No response

sotatech commented 1 year ago

I seem to be having the same problem on Home Assistant 2022.12 and RaspberryMatic add-in 3.65.11.20221005. It completely locks up the HA server, I can only power it off and on again. Problem reoccurs after an hour or two. IMG_0898

jens-maus commented 1 year ago

Sorry, but the rational regarding these not enough space in buffers. kernel messages is still the same: these are just the symptoms of an overwhelming busy message communication between the used homematic RF module (e.g. HmIP-RFUSB) and the corresponding processes consuming these messages (e.g. HMIPServer or rfd).

That means, for some reason the RF module is not able to either forward all incoming messages to HMIPServer or rfd or vice versa. And something like this could be related to a too high carrier sense level or because the repetition of communication with the RF module is too high (e.g. because some logic is trying to send out too many messages at a certain time interval) Ergo: Your environment seems to be too busy with having to deal with Homematic communication...

sotatech commented 1 year ago

Thank you. Please bear with me a little while I try to understand this problem, as our heating and all automations for lights, etc keep stopping and my family is not happy! I am using a HB-RF-ETH with firmware 1.3.0 and this problem seemed to start after I upgraded HA to 2022.12.0 this morning. Nothing has changed in my setup apart from this. I did notice that the CS was climbing to 10% at times, but I have seen that before without any problems. Where do you suggest I start? I have attached log files in case they are of any use in diagnosing this.

Thanks. messages.txt hmserver.log boot.log

Jacuo commented 1 year ago

That means, for some reason the RF module is not able to either forward all incoming messages to HMIPServer or rfd or vice versa. And something like this could be related to a too high carrier sense level or because the repetition of communication with the RF module is too high (e.g. because some logic is trying to send out too many messages at a certain time interval) Ergo: Your environment seems to be too busy with having to deal with Homematic communication...

But, at least in my case only rasberrymatic is producing messages - it is a charly device, so it looks that this problem is caused by rasberrymatic ?

jens-maus commented 1 year ago

That means, for some reason the RF module is not able to either forward all incoming messages to HMIPServer or rfd or vice versa. And something like this could be related to a too high carrier sense level or because the repetition of communication with the RF module is too high (e.g. because some logic is trying to send out too many messages at a certain time interval) Ergo: Your environment seems to be too busy with having to deal with Homematic communication...

But, at least in my case only rasberrymatic is producing messages - it is a charly device, so it looks that this problem is caused by rasberrymatic ?

No, the same would happen if you would use the original CCU3 software on it. This issue (Not enough space in buffers) is – as I said – only showing the symptoms of either a too busy RF module due to either not being able to transmit fast enough to its target RF devices or due to the interface processes (HMIPServer, rfd, etc.) not dealing with the load fast enough. Thus, reduce your automations and make sure you reduce the load on the RF interface.

Jacuo commented 1 year ago
  1. So why ii starts after some time after restart of device ? When I reboot server during 2-3 days everything works OK and then it starts
  2. I have 12 phisical devices connected + some virtual created + 2 connected via lan gateway. Is it to much - Could I find something about limits of device ?
  3. If I add HmIP-HAP as kind of repeater - will it help ?
sotatech commented 1 year ago

I went around and removed the batteries from devices until I saw the Carrier Sense drop, and it seems to have improved. It's been up for a couple of hours now. Looking at the CS values in HA is a lot different that what is displayed on RaspberryMatic. I will experiment more tomorrow. However two issues remain:

  1. The fact that RaspberryMatic add-in can bring down the entire Home Assistant installation.
  2. That the Homematic wireless network is so sensitive to interference and is limited in how much traffic it can handle for devices and automations.
Jacuo commented 1 year ago

I have 0 in RasberryMatic image

sotatech commented 1 year ago

Just to update this, I decided to move my RaspberryMatic onto a spare TinkerBoard S to keep Home Assistant running. However, Home Assistant continued to run out of memory and crash intermittently. It was caused by a memory leak in a HA module and is fixed in 2022.12.3 this morning. So the memory utilisation errors that the RM add-in was showing were caused by HA, not by Carrier Sense errors. However I still see intermittent CS peaks. I found that the CS indication on RM is not very accurate and using the history in HA gives a better indication. I suspect that I have always had this problem, but was not aware of it. The peaks in the graph below are 3 to 4 minutes apart, but often I do not see them on the RaspberryMatic home page. Because they are so intermittent, they are going to be hard to track down. image

jens-maus commented 1 year ago

However, Home Assistant continued to run out of memory and crash intermittently. It was caused by a memory leak in a HA module and is fixed in 2022.12.3 this morning. So the memory utilisation errors that the RM add-in was showing were caused by HA, not by Carrier Sense errors.

Thanks for coming back here and reporting this. This is actually good news since it really wondered me how the RaspberryMatic HA Add-on could bring down the HA host system since it is a sealed docker container.

However I still see intermittent CS peaks. I found that the CS indication on RM is not very accurate and using the history in HA gives a better indication.

The CarrierSense display on the main WebUI page is only updated once per minute due to performance reasons. Thus, if you need a detailed CS history you will of course have to use external methods to monitor the change in CarrierSense or have a look into the logfiles to get notified by a too high CS value at a time. Nevertheless, if your CS values regularly raise higher than 10% then this is a sign of an RF interference in the near vicinity of the antenna of your homematic RF module. Thus, bring it more far away from any possible RF interferences and also shield it, e.g. by using a proper high-quality USB cable (if you are using a HmIP-RFUSB) and potentially also using a USB RF filter like https://de.elv.com/elv-usb-entstoerfilter-usb-ef1-komplettbausatz-152745

sotatech commented 1 year ago

Yes, having an add-in bring down a HAOS server would not be a good feature, which was why I moved it to the TinkerBoard to exclude it. So definitely good news there. Looking at the CS graph again, I'm beginning to suspect something like the refrigerator. The gap between cycles is consistently 3 to 4 minutes so that might be a clue. I'll experiment over the weekend when I have time and report back. Thanks for your help.

Jacuo commented 1 year ago

monitor the change in CarrierSense

What tools are good to do it ?

sotatech commented 1 year ago

https://smartkram.de/hardware-shop/produkte/homematic-hardware/homematic-ip-funk/zubehoer/elv-funk-analyser-eq3-rfa-868mhz-fuer-homematic-ip-homematic-und-max/

I haven't tried this, but it might be worth investigating.

Jacuo commented 1 year ago

Tahbk you, but I was rather thinking about software to install on Charly to get this info from RPI-RF-MOD and send it somewhere