ironsheep / RPi-Reporter-MQTT2HA-Daemon

Linux service to collect and transfer Raspberry Pi data via MQTT to Home Assistant (for RPi Monitoring)
GNU General Public License v3.0
460 stars 64 forks source link

No updates when running "in background" #98

Closed aebgit closed 1 year ago

aebgit commented 1 year ago

Checklist:

Release with the issue: [2023-04-28 14:00:16] - ISP-RPi-mqtt-daemon.py v1.8.4

Last working release (if known): n/a

Hardware, Operating System, Python version:

cat /etc/debian_version

11.6

cat /etc/os-release

PRETTY_NAME="Raspbian GNU/Linux 11 (bullseye)" NAME="Raspbian GNU/Linux" VERSION_ID="11" VERSION="11 (bullseye)" VERSION_CODENAME=bullseye ID=raspbian ID_LIKE=debian HOME_URL="http://www.raspbian.org/" SUPPORT_URL="http://www.raspbian.org/RaspbianForums" BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"


<!--
Provide details about what you are seeing. Also which operating system this is on and what Python version you are using. If possible try to replicate the issue in other installations and include your findings here.
-->

**Description of problem:**

<!--
Explain what the issue is, and how things should look/behave. If possible provide a screenshot with a description.
-->

I run 5 Raspis with RPi-Reporter-MQTT2HA-Daemon in a plain network connected to Mosquitto broker and then to Homeassistant. 
I followed the installation guidelines straight forward and experienced no problems at all.

Four of them work fine, one has an issue. When I start the script manually via shell as unprivileged user or via sudo , it runs without problems:
`# python3 /opt/RPi-Reporter-MQTT2HA-Daemon/ISP-RPi-mqtt-daemon.py`.

However, the moment I start the script via systemctl, the following happens:
- the first set of telemetry is sent to the MQTT broker
- after that, there is NO other refresh of the information.
- The process is still alive (`#ps -fax`)

As a workaround I disabled it in systemctl and have added it to roots crontab, but the issue remained.
`@reboot /usr/bin/python3 -u /opt/RPi-Reporter-MQTT2HA-Daemon/ISP-RPi-mqtt-daemon.py`

I have also started it on a shell with debugging and verbose logging enabled, but could not spot any issues.
`sudo python3 /opt/RPi-Reporter-MQTT2HA-Daemon/ISP-RPi-mqtt-daemon.py -d -v`
...the only curious line I could find is this one, which appears occasionally in a different color:
`[2023-04-28 14:00:19] - << INTR(0) >> Time to report! (14:00:19 - 2023/04/28)` 

**Run our report script 'genBugInfo' on your failing device and include the output here:**

<!--
# run our script
$ ./genBugInfo 
genBugInfo: Wrote genBugInfo-200726-191034.lst
genBugInfo: Done

copy the contents of the genBugInfo-*.lst file here (or just attach this file to this report)
-->

SCRIPT genBugInfo v1.1 run 23/04/28-14:16:54

----------------------------------------------------------------------

/bin/cat /etc/apt/sources.list | /bin/egrep -v '#'

deb http://raspbian.raspberrypi.org/raspbian/ bullseye main contrib non-free rpi


/bin/cat /etc/apt/sources.list | /bin/egrep -v '#' | /usr/bin/awk '{ print $3 }' | /bin/grep . | /usr/bin/sort -u | head -1

bullseye


/bin/uname -r

6.1.21-v8+


/bin/hostname -f

display1


/usr/bin/uptime

14:16:54 up 11:16, 3 users, load average: 0.19, 0.25, 0.21


/sbin/ifconfig

br-e454e7933844: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet netmask 255.255.0.0 broadcast inet6 prefixlen 64 scopeid 0x20 ether txqueuelen 0 (Ethernet) RX packets 124251 bytes 11953095 (11.3 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 260768 bytes 347705178 (331.5 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet netmask 255.255.0.0 broadcast ether txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether txqueuelen 1000 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1000 (Local Loopback) RX packets 6508 bytes 631412 (616.6 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 6508 bytes 631412 (616.6 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

veth4c9b962: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet netmask 255.255.0.0 broadcast inet6 prefixlen 64 scopeid 0x20 ether txqueuelen 0 (Ethernet) RX packets 124213 bytes 13686696 (13.0 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 261413 bytes 347910389 (331.7 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet netmask 255.255.255.0 broadcast inet6 prefixlen 64 scopeid 0x0 inet6 prefixlen 64 scopeid 0x20 ether txqueuelen 1000 (Ethernet) RX packets 573845 bytes 441732653 (421.2 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 142838 bytes 17020364 (16.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0


/sbin/ifconfig | /bin/egrep 'Link|flags|inet|ether' | /bin/egrep -v -i 'lo:|loopback|inet6|\:\:1|127.0.0.1'

br-e454e7933844: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet netmask 255.255.0.0 broadcast ether txqueuelen 0 (Ethernet) docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet netmask 255.255.0.0 broadcast ether txqueuelen 0 (Ethernet) eth0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 ether txqueuelen 1000 (Ethernet) veth4c9b962: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet netmask 255.255.0.0 broadcast ether ea:af:32:f6:f8:e4 txqueuelen 0 (Ethernet) wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet netmask 255.255.255.0 broadcast ether e4:5f:01:f3:71:30 txqueuelen 1000 (Ethernet)


/sbin/route

Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default 0.0.0.0 UG 303 0 0 wlan0 link-local 0.0.0.0 255.255.0.0 U 209 0 0 veth4c9b962 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 172.18.0.0 0.0.0.0 255.255.0.0 U 0 0 0 br-e454e7933844 192.168.178.0 0.0.0.0 255.255.255.0 U 303 0 0 wlan0


/bin/ls -l /var/log/dpkg.log /var/log/dpkg.log.1 2>/dev/null

-rw-r--r-- 1 root root 45433 Apr 16 13:01 /var/log/dpkg.log -rw-r--r-- 1 root root 6847 Mar 2 19:36 /var/log/dpkg.log.1


/bin/grep 'status installed' /var/log/dpkg.log /var/log/dpkg.log.1 2>/dev/null | sort | tail -1

/var/log/dpkg.log:2023-04-16 13:01:02 status installed mailcap:all 3.69


/bin/df -m

Filesystem 1M-blocks Used Available Use% Mounted on /dev/root 29644 6969 21418 25% / devtmpfs 1640 0 1640 0% /dev tmpfs 1898 0 1898 0% /dev/shm tmpfs 759 2 758 1% /run tmpfs 5 1 5 1% /run/lock /dev/mmcblk0p1 255 52 204 21% /boot tmpfs 380 1 380 1% /run/user/1000


/bin/df -m | /usr/bin/tail -n +2 | /bin/egrep -v 'tmpfs|boot'

/dev/root 29644 6969 21418 25% /


ls -l /opt/vc/bin/vcgencmd /usr/bin/vcgencmd

ls: cannot access '/opt/vc/bin/vcgencmd': No such file or directory -rwxr-xr-x 1 root root 13948 Mar 22 17:33 /usr/bin/vcgencmd



**Python errors shown in the logs (if applicable):**


**Additional information:**
aebgit commented 1 year ago

...should have done this earlier. Here is some additional detail from journalctl -b --no-pager -u isp-rpi-reporter.service , but I have difficulties interpreting it:

Apr 28 13:38:16 display1 systemd[1]: isp-rpi-reporter.service: Succeeded.
Apr 28 13:38:16 display1 systemd[1]: isp-rpi-reporter.service: Consumed 15.902s CPU time.
Apr 28 13:38:19 display1 systemd[1]: isp-rpi-reporter.service: Scheduled restart job, restart counter is at 4.
Apr 28 13:38:19 display1 systemd[1]: Stopped RPi Reporter MQTT Client/Daemon.
Apr 28 13:38:19 display1 systemd[1]: isp-rpi-reporter.service: Consumed 15.902s CPU time.
Apr 28 13:38:19 display1 systemd[1]: Starting RPi Reporter MQTT Client/Daemon...
Apr 28 13:38:23 display1 systemd[1]: Started RPi Reporter MQTT Client/Daemon.
Apr 28 13:38:24 display1 python3[16402]: Traceback (most recent call last):
Apr 28 13:38:24 display1 python3[16402]:   File "/opt/RPi-Reporter-MQTT2HA-Daemon/ISP-RPi-mqtt-daemon.py", line 1809, in <module>
Apr 28 13:38:24 display1 python3[16402]:     afterMQTTConnect()  # now instead of after?
Apr 28 13:38:24 display1 python3[16402]:   File "/opt/RPi-Reporter-MQTT2HA-Daemon/ISP-RPi-mqtt-daemon.py", line 1799, in afterMQTTConnect
Apr 28 13:38:24 display1 python3[16402]:     handle_interrupt(0)
Apr 28 13:38:24 display1 python3[16402]:   File "/opt/RPi-Reporter-MQTT2HA-Daemon/ISP-RPi-mqtt-daemon.py", line 1782, in handle_interrupt
Apr 28 13:38:24 display1 python3[16402]:     update_values()
Apr 28 13:38:24 display1 python3[16402]:   File "/opt/RPi-Reporter-MQTT2HA-Daemon/ISP-RPi-mqtt-daemon.py", line 1763, in update_values
Apr 28 13:38:24 display1 python3[16402]:     getSystemTemperature()
Apr 28 13:38:24 display1 python3[16402]:   File "/opt/RPi-Reporter-MQTT2HA-Daemon/ISP-RPi-mqtt-daemon.py", line 968, in getSystemTemperature
Apr 28 13:38:24 display1 python3[16402]:     interpretedTemp = float(rpi_gpu_temp_raw)
Apr 28 13:38:24 display1 python3[16402]: ValueError: could not convert string to float: "Can't open device file: /dev/vcio\nTry creating a device file with: sudo mknod /dev/vcio c 100 0"
Apr 28 13:43:24 display1 python3[16402]: Exception in thread Thread-5:
Apr 28 13:43:24 display1 python3[16402]: Traceback (most recent call last):
Apr 28 13:43:24 display1 python3[16402]:   File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
Apr 28 13:43:24 display1 python3[16402]:     self.run()
Apr 28 13:43:24 display1 python3[16402]:   File "/usr/lib/python3.9/threading.py", line 1266, in run
Apr 28 13:43:24 display1 python3[16402]:     self.function(*self.args, **self.kwargs)
Apr 28 13:43:24 display1 python3[16402]:   File "/opt/RPi-Reporter-MQTT2HA-Daemon/ISP-RPi-mqtt-daemon.py", line 1490, in periodTimeoutHandler
Apr 28 13:43:24 display1 python3[16402]:     handle_interrupt(TIMER_INTERRUPT)  # '0' means we have a timer interrupt!!!
Apr 28 13:43:24 display1 python3[16402]:   File "/opt/RPi-Reporter-MQTT2HA-Daemon/ISP-RPi-mqtt-daemon.py", line 1782, in handle_interrupt
Apr 28 13:43:24 display1 python3[16402]:     update_values()
Apr 28 13:43:24 display1 python3[16402]:   File "/opt/RPi-Reporter-MQTT2HA-Daemon/ISP-RPi-mqtt-daemon.py", line 1763, in update_values
Apr 28 13:43:24 display1 python3[16402]:     getSystemTemperature()
Apr 28 13:43:24 display1 python3[16402]:   File "/opt/RPi-Reporter-MQTT2HA-Daemon/ISP-RPi-mqtt-daemon.py", line 968, in getSystemTemperature
Apr 28 13:43:24 display1 python3[16402]:     interpretedTemp = float(rpi_gpu_temp_raw)
Apr 28 13:43:24 display1 python3[16402]: ValueError: could not convert string to float: "Can't open device file: /dev/vcio\nTry creating a device file with: sudo mknod /dev/vcio c 100 0"
jandamf commented 1 year ago

I should have checked the list before upgrading. I am now having the same problem after updating to latest version.

I was 3 versions back and worked ok.

ornative commented 1 year ago

I am having the same issue on the same platform on both .83 and .84. Runs fine when started interactively, but has the same problem shown in the journalctl shown above.

ironsheep commented 1 year ago

@jandamf and @ornative something we don't understand but seems to be happening is that the daemon user appears to be having permissions reset when the OS update happened. As a result, the fix is to apply once again the permissions needed to run the script. (please let me know if this solves your problem.)

Here's my response to duplicate issue #94:

Since the errors you show appear to be access to temps can you please recheck that your daemon account has the correct permissions? https://github.com/ironsheep/RPi-Reporter-MQTT2HA-Daemon#set-up-daemon-account-to-allow-access-to-temperature-values

(Since you can run interactively, it seems to be an issue with the Daemon account.)

jandamf commented 1 year ago

@ironsheep

Thanks for the suggestion. I re-applied the permissions and that solved the problem.

ornative commented 1 year ago

That did correct the problem. Interesting, would be nice to know what element in the update is messing with the permissions.

ironsheep commented 1 year ago

Yep, if we ever find out I'll post more in these closed issues...

aebgit commented 1 year ago

Solved. Thank you!!

xstrex commented 8 months ago

I know this issue is closed, but could someone please tell me what their permissions are on /dev/vcio I'm running this in dietpi, and /dev/vcio exists, but has perms of crw------- 1 root root 10, 125 Jan 16 14:20 /dev/vcio. I suspect others are seeing root:video. Before I chase this down with dietpi, what's the output of ls -hl /dev/vcio?