openhab / openhab-core

Core framework of openHAB
https://www.openhab.org/
Eclipse Public License 2.0
925 stars 428 forks source link

Cron trigger in rules is more than 20 minutes delayed #3998

Open sternegugger opened 10 months ago

sternegugger commented 10 months ago

Using openHAB 4.1.0. A Rule is triggered by cron cronExpression: 0 0 9 ? * MON,TUE,WED,THU,FRI *. But the rule action runs allways more than 20 minutes delayed.

I already tried this, without success:

In the community I found a similar problem: https://community.openhab.org/t/everything-happens-51-minutes-too-late/132112 But there they solved the problem by a restart, which dit not work for me.

Does OH run an own clock?

Expected Behavior

The rule actions should be triggered in the same minute of the cron trigger

Current Behavior

Example rule:

...
triggers:
  - id: "1"
    configuration:
      cronExpression: 0 0 9 ? * MON,TUE,WED,THU,FRI *
    type: timer.GenericCronTrigger
conditions: []
actions:
  - inputs: {}
    id: "2"
    configuration:
      command: ON
      itemName: MQTTLED03_LED03onoff
    type: core.ItemCommandAction
...

Eventlog:

2023-12-14 09:24:34.188 [INFO ] [openhab.event.ItemCommandEvent      ] - Item 'MQTTLED03_LED03onoff' received command ON

Expected: The event should be triggered at 9:00, not at 9:24

Possible Solution

If OH uses an own clock, it should be synchonized with the OS clock.

Steps to Reproduce (for Bugs)

On my system I can reproduce it with different cron and time triggers. On a clean new OH installation, the bug would probably be not reprducable.

Context

I am using OH since version 3.0.0 and updated it several times by sudo apt-get upgrade. In former times I did not have the time delay.

Your Environment

rkoshak commented 10 months ago

If OH uses an own clock, it should be synchonized with the OS clock.

It doesn't use it's own clock and does use the system clock. It does it's own timezone configuration but it uses the OS clock.

Something else is going on.

OH 4 is known to require more RAM than OH 3. Have you reached the limit of your machine? Check the system settings, in particular the system load, available memory, and swap. If the load is high, swap is in use and/or available memory is low, you might have outgrown your machine.

sternegugger commented 10 months ago

OH 4 is known to require more RAM than OH 3. Have you reached the limit of your machine? Check the system settings, in particular the system load, available memory, and swap. If the load is high, swap is in use and/or available memory is low, you might have outgrown your machine.

Here the result of the "top" command:

top - 08:59:52 up 8 days, 15:51,  1 user,  load average: 1,04, 1,03, 1,00
Tasks: 112 total,   1 running, 111 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2,7 us,  1,0 sy,  0,0 ni, 96,3 id,  0,0 wa,  0,0 hi,  0,0 si,  0,0 st
MiB Mem :   1964,1 total,     71,8 free,   1254,2 used,    638,2 buff/cache
MiB Swap:   2048,0 total,   1766,8 free,    281,2 used.    547,3 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   2134 openhab   20   0 3407660   1,0g   8824 S   2,3  52,3 136:31.87 java
    668 mosquit+  20   0   42284  30412   4440 S   0,3   1,5   7:41.19 mosquitto
 597007 michael   20   0   17320   7952   5532 S   0,3   0,4   0:00.22 sshd
      1 root      20   0  166504   8660   5900 S   0,0   0,4   0:39.55 systemd
      2 root      20   0       0      0      0 S   0,0   0,0   0:00.07 kthreadd

I do not think the load is the problem:

Question:

rkoshak commented 10 months ago

A load average above 1 is concerning. It means at any given time over the past 15 minutes you've always had at least one process stuck waiting around for resources.

The fact that you are suing any swap at all and your free memory being really low points to this machine being over allocated and it's run out of RAM.

Other triggers (e.g. switches) are working directly.

If those are still in RAM and haven't been swapped out that's to be expected. The parts of RAM that have been accessed the longest ago are the parts that get swapped out first. Cron triggered rules don't need to subscribe to the event bus so those parts of RAM get accessed less frequently and therefore are more likely to be swapped out.

During this long time is allways a slot with CPU load less than 5%.

It's not CPU constrained. It's RAM constrained. The load shows you always have at least one process waiting around for some bit of data to be restored to RAM from SWAP (which means taking some other chunk of data out of RAM and writing it to the file system at slow USB speeds, and then reading the needed chunk of data from the file system at USB speeds back into RAM) so your overall CPU utilization is going to be very low. It's stuck waiting around for two file system operations to complete before it can process any instructions for that process.

The greater your system load, the less your CPU will be able to run.

Based on that top command, it's very clear to me that the problem is RAM. You need to move something to another machine, or move everything to a machine with more RAM.

I've helped many people though this (high load, swap in use, rules triggering gets delayed). Every time they argue "it can't be the machine" and spend hours and hours looking for some other cause. Every time freeing up RAM or moving to a machine with more RAM has fixed the problem.

You can put OH into trace logging and see if there is anything there, but the problem is clear to me.

olili commented 10 months ago

I#m not sure whether my issue is related to yours, but I'm facing also problems with different times and deviations since introduction of oh4. [https://github.com/openhab/openhab-js/issues/299].

As a workaround i reload all rules/scripts with time constraints at night by script.

o.

sternegugger commented 10 months ago

A load average above 1 is concerning. It means at any given time over the past 15 minutes you've always had at least one process stuck waiting around for resources.

@rkoshak Thank you for this. I have now increased the CPU of the virtual machine from 1 to 2 kernels. Now the constant delay is fixed (actually the delay is now 1 minute, which is OK for me).

The fact that you are suing any swap at all and your free memory being really low points to this machine being over allocated and it's run out of RAM.

Additionally I increased the RAM size from 2GB to 4GB and the delay is now less than 1 second.

It would be nice to find the minimum system requirement somewhere in the documentation. The only section I found is on the welcome page While You Are Getting Started:

"If you have no strong preference, get a Raspberry Pi 4 with 2 GB and a 16 GB SD card and install openHABian for the best experience."

I would expect the system requirement here: openHAB on Linux: Meeting the Requirements

It would be nice to have an update in the documentation, for all the others who run into the same problem. Thank you again for your help.

rkoshak commented 10 months ago

It's still under investigation. And there is debate. an average OH instance by itself runs just fine on an RPi 3. It's only when you add third party stuff like Mosquitto, InfluxDB, Frontail, NodeRed, zigbee2mqtt, etc. that tracks arise.

Does that mean an RPi 3 isn't enough to run OH?