ironsheep / RPi-Reporter-MQTT2HA-Daemon

Linux service to collect and transfer Raspberry Pi data via MQTT to Home Assistant (for RPi Monitoring)
GNU General Public License v3.0
441 stars 62 forks source link

systemctl stopped being able start process #139

Open dustball62 opened 4 months ago

dustball62 commented 4 months ago

I believe I've exhausted everything i can think of.

I have 4 RPis and one stopped working last night in the middle of the night and I'm yet to find a cause. the other 3 are still reporting as normal.

In my troubleshooting I've done the following.

  1. confirmed- groups daemon $ daemon : daemon video

  2. rebooted several times

  3. stopped and started the service serveral times- always fails with:

**Job for isp-rpi-reporter.service failed because a timeout was exceeded.
See "systemctl status isp-rpi-reporter.service" and "journalctl -xe" for details.**

reloading the daemon works fine.

  1. journal logs shows:

Feb 17 12:44:58 weewxstation systemd[1]: Starting RPi Reporter MQTT Client/Daemon... Feb 17 12:46:28 weewxstation systemd[1]: isp-rpi-reporter.service: start operation timed out. Terminatin g. Feb 17 12:46:28 weewxstation systemd[1]: isp-rpi-reporter.service: Failed with result 'timeout'. Feb 17 12:46:28 weewxstation systemd[1]: Failed to start RPi Reporter MQTT Client/Daemon. Feb 17 12:46:28 weewxstation systemd[1]: isp-rpi-reporter.service: Consumed 50.569s CPU time. Feb 17 12:46:31 weewxstation systemd[1]: isp-rpi-reporter.service: Scheduled restart job, restart counte r is at 1.

  1. ran the python directly file with debugging and verbose: runs and updates the broker as you would expect.-ie runs good!

  2. confirmed the owner of the file is daemon----> -rwxr-xr-x 1 daemon root 70646 Feb 5 12:13 ISP-RPi-mqtt-daemon.py

  3. i havent updated to the new-Paho-mqtt but I made the update in the requirements.txt file just in case

  4. ran get pulljust to be sure but they all 4 are running the same code versions.

  5. rechecked the config.ini file for any changes

  6. this unit didnt reboot or anything in the middle of the night- it just stopped reporting out of the blue and hasnt come back


    side note: i was trying to get the remote commands working the other day but i havent touched it in about 5 days. per your instructions on RMTECTRL.md that has been unsuccessful- for any node on my network


per your instructions: "rpi_model": "RPi 1 ModelB r2", OS bullseye; python3

contents of the debug file

/bin/hostname -f

weewxstation


/usr/bin/uptime

12:57:40 up 2:28, 3 users, load average: 1.57, 1.56, 1.55


/sbin/ifconfig

SCRIPT genBugInfo v1.1 run 24/02/17-12:57:39

----------------------------------------------------------------------

/bin/cat /etc/apt/sources.list | /bin/egrep -v '#'

deb http://raspbian.raspberrypi.org/raspbian/ bullseye main contrib non-free rpi


/bin/cat /etc/apt/sources.list | /bin/egrep -v '#' | /usr/bin/awk '{ print $3 }' | /bin/grep . | /usr/bin/sort -u | head -1

bullseye


/bin/uname -r

6.1.21+


/bin/hostname -f

weewxstation


/usr/bin/uptime

12:57:40 up 2:28, 3 users, load average: 1.57, 1.56, 1.55


/sbin/ifconfig

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.0.108 netmask 255.255.255.0 broadcast 192.168.0.255 inet6 fe80::a5f8:19bd:c691:fdbf prefixlen 64 scopeid 0x20 loop txqueuelen 1000 (Local Loopback) RX packets 558 bytes 30759 (30.0 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 558 bytes 30759 (30.0 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0


/sbin/ifconfig | /bin/egrep 'Link|flags|inet|ether' | /bin/egrep -v -i 'lo:|loopback|inet6|\:\:1|127.0.0.1'

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 192.168.0.108 netmask 255.255.255.0 broadcast 192.168.0.255 ether b8:27:eb:01:e8:71 txqueuelen 1000 (Ethernet)


/sbin/route

Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface default 192.168.0.1 0.0.0.0 UG 202 0 0 eth0 192.168.0.0 0.0.0.0 255.255.255.0 U 202 0 0 eth0


/bin/ls -l /var/log/dpkg.log /var/log/dpkg.log.1 2>/dev/null

-rw-r--r-- 1 root root 0 Feb 6 00:00 /var/log/dpkg.log -rw-r--r-- 1 root root 2394 Feb 5 12:10 /var/log/dpkg.log.1


/bin/grep 'status installed' /var/log/dpkg.log /var/log/dpkg.log.1 2>/dev/null | sort | tail -1

/var/log/dpkg.log.1:2024-02-05 12:10:42 status installed python3-unidecode:all 1.2.0-1


/bin/df -m

Filesystem 1M-blocks Used Available Use% Mounted on /dev/root 29643 4692 23691 17% / devtmpfs 87 0 87 0% /dev tmpfs 215 0 215 0% /dev/shm tmpfs 86 2 85 2% /run tmpfs 5 1 5 1% /run/lock /dev/mmcblk0p1 255 51 205 20% /boot tmpfs 43 1 43 1% /run/user/1000 /dev/sda2 14215 4667 8919 35% /media/pi/rootfs /dev/sda1 253 28 226 11% /media/pi/boot


/bin/df -m | /usr/bin/tail -n +2 | /bin/egrep -v 'tmpfs|boot'

/dev/root 29643 4692 23691 17% / /dev/sda2 14215 4667 8919 35% /media/pi/rootfs


ls -l /opt/vc/bin/vcgencmd /usr/bin/vcgencmd

ls: cannot access '/opt/vc/bin/vcgencmd': No such file or directory -rwxr-xr-x 1 root root 13948 Mar 22 2023 /usr/bin/vcgencmd


Thanks in advance for looking at this.

Dan1jel commented 3 months ago

i also got this issue, i have two RPI 4, 4gb with newest OS/update. Tried this to get it to work, on my RPI with HA, no issues at all, when i went to my other RPI, i cant get it to work.

at first it didnt work with paho-mqtt 2.0.0, downgraded to 1.6.1, now it works when i enter the command by itself (python3 /opt/RPi-Reporter-MQTT2HA-Daemon/ISP-RPi-mqtt-daemon.py) but not via demon, then it gives this error..

░ A start job for unit isp-rpi-reporter.service has begun execution. ░░ ░░ The job identifier is 222367. Mar 26 16:22:39 server python3[495779]: Exception ignored in: <function Client.del at 0x7f800f45e0> Mar 26 16:22:39 server python3[495779]: Traceback (most recent call last): Mar 26 16:22:39 server python3[495779]: File "/usr/local/lib/python3.11/dist-packages/paho/mqtt/client.py", line 874, in del Mar 26 16:22:39 server python3[495779]: self._reset_sockets() Mar 26 16:22:39 server python3[495779]: File "/usr/local/lib/python3.11/dist-packages/paho/mqtt/client.py", line 1133, in _reset_sockets Mar 26 16:22:39 server python3[495779]: self._sock_close() Mar 26 16:22:39 server python3[495779]: File "/usr/local/lib/python3.11/dist-packages/paho/mqtt/client.py", line 1119, in _sock_close Mar 26 16:22:39 server python3[495779]: if not self._sock: Mar 26 16:22:39 server python3[495779]: ^^^^^^^^^^ Mar 26 16:22:39 server python3[495779]: AttributeError: 'Client' object has no attribute '_sock' Mar 26 16:22:39 server python3[495779]: Traceback (most recent call last): Mar 26 16:22:39 server python3[495779]: File "/opt/RPi-Reporter-MQTT2HA-Daemon/ISP-RPi-mqtt-daemon.py", line 1286, in Mar 26 16:22:39 server python3[495779]: mqtt_client = mqtt.Client() Mar 26 16:22:39 server python3[495779]: ^^^^^^^^^^^^^ Mar 26 16:22:39 server python3[495779]: TypeError: Client.init() missing 1 required positional argument: 'callback_api_version' Mar 26 16:22:40 server systemd[1]: isp-rpi-reporter.service: Main process exited, code=exited, status=1/FAILURE ░░ Subject: Unit process exited ░░ Defined-By: systemd ░░ Support: https://www.debian.org/support ░░ ░░ An ExecStart= process belonging to unit isp-rpi-reporter.service has exited. ░░ ░░ The process' exit code is 'exited' and its exit status is 1. Mar 26 16:22:40 server systemd[1]: isp-rpi-reporter.service: Failed with result 'exit-code'. ░░ Subject: Unit failed ░░ Defined-By: systemd ░░ Support: https://www.debian.org/support ░░ ░░ The unit isp-rpi-reporter.service has entered the 'failed' state with result 'exit-code'. Mar 26 16:22:40 server systemd[1]: Failed to start isp-rpi-reporter.service - RPi Reporter MQTT Client/Daemon. ░░ Subject: A start job for unit isp-rpi-reporter.service has failed ░░ Defined-By: systemd ░░ Support: https://www.debian.org/support ░░ ░░ A start job for unit isp-rpi-reporter.service has finished with a failure. ░░ ░░ The job identifier is 222367 and the job result is failed. Mar 26 16:22:40 server systemd[1]: isp-rpi-reporter.service: Consumed 3.881s CPU time. ░░ Subject: Resources consumed by unit runtime ░░ Defined-By: systemd ░░ Support: https://www.debian.org/support ░░ ░░ The unit isp-rpi-reporter.service completed and consumed the indicated resources. Mar 26 16:22:41 server systemd[1]: Stopped isp-rpi-reporter.service - RPi Reporter MQTT Client/Daemon. ░░ Subject: A stop job for unit isp-rpi-reporter.service has finished ░░ Defined-By: systemd ░░ Support: https://www.debian.org/support ░░ ░░ A stop job for unit isp-rpi-reporter.service has finished. ░░ ░░ The job identifier is 222459 and the job result is done. Mar 26 16:22:41 server systemd[1]: isp-rpi-reporter.service: Consumed 3.881s CPU time. ░░ Subject: Resources consumed by unit runtime ░░ Defined-By: systemd ░░ Support: https://www.debian.org/support ░░ ░░ The unit isp-rpi-reporter.service completed and consumed the indicated resources. lines 1366-1429/1429 (END)

dustball62 commented 3 months ago

dan1jel,- I was eventually able to get it to work. I basically had to rip it all out, specially the daemon and start fresh all over making sure to install only paho-mqtt 1.6.

interestingly you get those alerts how far behind you are on packages yet if you install them it breaks it... :) - so I've got more than six PIs in service and I've not upgrade them waiting for this to be resolved.

dustball62 commented 3 months ago

The remote reboot feature- I've never gotten that to work.

Dan1jel commented 3 months ago

dan1jel,- I was eventually able to get it to work. I basically had to rip it all out, specially the daemon and start fresh all over making sure to install only paho-mqtt 1.6.

interestingly you get those alerts how far behind you are on packages yet if you install them it breaks it... :) - so I've got more than six PIs in service and I've not upgrade them waiting for this to be resolved.

How do you "start from fresh", a clean OS install or just te script? I tried to start all over with the script install but still get the same result.

When using "python3..." As a singel command it works, but when I try via the daemon, I get the error... Don't know how I can do it otherwise?!

Edit: Might know what i did wrong, when i uninstalled 2.0.0 i think i didnt used "sudo", and there for, somehow it was still installed and used by Daemon. When i then did use:

sudo pip3 uninstall paho-mqtt --break-system-packages
sudo systemctl daemon-reload
sudo pip3 install paho-mqtt==1.6.1 --break-system-packages
sudo systemctl start isp-rpi-reporter.service

it workes as intended now.

bsimmo commented 3 months ago

Is your demon working in the venv? Install paho using apt not pip, it then becomes a system package and not part of the environment. So there is a good change you are picking up the system one or vice versa.

I've not had a play for a long time as mine just tick over it think, saying that I've not checked!

If you can install all the modules you need using apt, then you don't actually need to use a venv.

I'll pop up some things later if you want to check stuff, but can't do that on my phone easily.

Why apt.. Because Debian keeps things simple, they never update stuff until a new OS release, so paho is stuck at 1.6.1 https://packages.debian.org/bookworm/all/python3-paho-mqtt It will work with the OS too.

My advice with Bookworm is to not use pip, but install everything you can with apt. Then use as normal.

Dan1jel commented 3 months ago

Is your demon working in the venv? Install paho using apt not pip, it then becomes a system package and not part of the environment. So there is a good change you are picking up the system one or vice versa.

I've not had a play for a long time as mine just tick over it think, saying that I've not checked!

If you can install all the modules you need using apt, then you don't actually need to use a venv.

I'll pop up some things later if you want to check stuff, but can't do that on my phone easily.

Why apt.. Because Debian keeps things simple, they never update stuff until a new OS release, so paho is stuck at 1.6.1 https://packages.debian.org/bookworm/all/python3-paho-mqtt It will work with the OS too.

My advice with Bookworm is to not use pip, but install everything you can with apt. Then use as normal.

Yes my bad, like I said before, I think I had paho-mqtt 2.0.0 via pip first, but after I uninstalled it like above, python3-paho-mqtt was there to take over I assume.

Checked now and pip3 list shows paho-mqtt installed, but it's the apt version that is installed.

But I agree, rather to have the apt version then pip verison.

Thanks for pointing this out and made me check again :)

dustball62 commented 3 months ago

I just removed the scripts and packages. Look back at my records of what I did- it looks like I also moved and did some work in the sudoers.d/sudoers files that helped.