jardiamj / BYOWS_RPi

weeWX Driver for Build Your Own Weather Station for Raspberry Pi
GNU General Public License v3.0
7 stars 7 forks source link

Weewx crashes randomly (several times a day) #16

Closed groetg closed 2 months ago

groetg commented 2 months ago

I am running Weewx 5.1 on an rpi 4 with a BYOWS driver. It crashes several times a day at random times (weewx process stops somehow), below is the log file of a time where it crashes. I have created a crontab task that starts Weewx every hour (in case it has crashed and starts again) and that works, but then gaps appear, I can create that task every 5 minutes, but would rather get rid of the source of the problems and I can't figure out what is not going right now. Can anyone help me? In the Google Weewx users group Tom assumes it is a driver problem....

Log: Sep 22 11:11:32 byows-jim weewxd[71261]: INFO weewx.engine: Main loop exiting. Shutting engine down. Sep 22 11:11:32 byows-jim weewxd[71261]: INFO weewx.engine: Shutting down StdReport thread Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: Caught unrecoverable exception: Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: list index out of range Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: Traceback (most recent call last): Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: File "/usr/share/weewx/weewxd.py", line 127, in main Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: engine.run() Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: File "/usr/share/weewx/weewx/engine.py", line 204, in run Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: for packet in self.console.genLoopPackets(): Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: File "/etc/weewx/bin/user/byows_rpi.py", line 83, in genLoopPackets Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: data = self.station.get_data() Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: ^^^^^^^^^^^^^^^^^^^^^^^ Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: File "/etc/weewx/bin/user/byows_rpi.py", line 147, in get_data Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: data["soilTemp1"] = self.get_soil_temp() Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: ^^^^^^^^^^^^^^^^^^^^ Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: File "/etc/weewx/bin/user/byows_rpi.py", line 129, in get_soil_temp Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: return self.temp_probe.read_temp() Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: File "/etc/weewx/bin/user/byows_rpi.py", line 189, in read_temp Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: success = self.crc_check(lines) Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: ^^^^^^^^^^^^^^^^^^^^^ Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: File "/etc/weewx/bin/user/byows_rpi.py", line 180, in crc_check Sep 22 11:11:32 byows-jim weewxd[71261]: Traceback (most recent call last): Sep 22 11:11:32 byows-jim weewxd[71261]: File "/usr/share/weewx/weewxd.py", line 226, in Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: return lines[0].strip()[-3:] == "YES" Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: ~^^^ Sep 22 11:11:32 byows-jim weewxd[71261]: main() Sep 22 11:11:32 byows-jim weewxd[71261]: File "/usr/share/weewx/weewxd.py", line 127, in main Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: IndexError: list index out of range Sep 22 11:11:32 byows-jim weewxd[71261]: engine.run() Sep 22 11:11:32 byows-jim weewxd[71261]: File "/usr/share/weewx/weewx/engine.py", line 204, in run Sep 22 11:11:32 byows-jim weewxd[71261]: for packet in self.console.genLoopPackets(): Sep 22 11:11:32 byows-jim weewxd[71261]: File "/etc/weewx/bin/user/byows_rpi.py", line 83, in genLoopPackets Sep 22 11:11:32 byows-jim weewxd[71261]: CRITICAL main: **** Exiting. Sep 22 11:11:32 byows-jim weewxd[71261]: data = self.station.get_data() Sep 22 11:11:32 byows-jim weewxd[71261]: ^^^^^^^^^^^^^^^^^^^^^^^ Sep 22 11:11:32 byows-jim weewxd[71261]: File "/etc/weewx/bin/user/byows_rpi.py", line 147, in get_data Sep 22 11:11:32 byows-jim weewxd[71261]: data["soilTemp1"] = self.get_soil_temp() Sep 22 11:11:32 byows-jim weewxd[71261]: ^^^^^^^^^^^^^^^^^^^^ Sep 22 11:11:32 byows-jim weewxd[71261]: File "/etc/weewx/bin/user/byows_rpi.py", line 129, in get_soil_temp Sep 22 11:11:32 byows-jim weewxd[71261]: return self.temp_probe.read_temp() Sep 22 11:11:32 byows-jim weewxd[71261]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Sep 22 11:11:32 byows-jim weewxd[71261]: File "/etc/weewx/bin/user/byows_rpi.py", line 189, in read_temp Sep 22 11:11:32 byows-jim weewxd[71261]: success = self.crc_check(lines) Sep 22 11:11:32 byows-jim weewxd[71261]: ^^^^^^^^^^^^^^^^^^^^^ Sep 22 11:11:32 byows-jim weewxd[71261]: File "/etc/weewx/bin/user/byows_rpi.py", line 180, in crc_check Sep 22 11:11:32 byows-jim weewxd[71261]: return lines[0].strip()[-3:] == "YES" Sep 22 11:11:32 byows-jim weewxd[71261]: ~^^^ Sep 22 11:11:32 byows-jim weewxd[71261]: IndexError: list index out of range Sep 22 11:11:32 byows-jim systemd[1]: weewx.service: Main process exited, code=exited, status=1/FAILURE Sep 22 11:11:32 byows-jim systemd[1]: weewx.service: Failed with result 'exit-code'.

jardiamj commented 2 months ago

The driver fails when trying to read the temperature from the DS18B20 sensor. We get this value by reading from a file named w1_slave under a directory in /sys/bus/w1/devices/ with a name like 28-xxxxxxxxxxxx. The content of the file should look like this:

cc 01 4b 46 7f ff 04 10 67 : crc=67 YES
cc 01 4b 46 7f ff 04 10 67 t=28750

The driver checks, in line 180, that the data is valid by looking for the YES at the end of the first line. If that is successful, then records the temperature from the second line (t=28750). Now, it looks like the driver fails when looking for that YES (crc_check). Based on the index out of range error I suspect that the file is empty at that point and thus lines is empty.

That is what I would look for first, and if that is the case I would add a check to ensure that lines is not empty before doing the crc_check. This should be done in lines 191 to 195, where we make up to 3 attempts to read the temperature.

Unfortunately, I don't have the hardware to test it but will try to help as much as possible.

You can also try debugging using the [ds18b20_therm.py](https://github.com/jardiamj/BYOWS_RPi/blob/master/files/ds18b20_therm.py) file in this repo. Add code in theif name == "main":` section to read the temperature in a loop at a given interval and see if it fails in the same way. Add debugging output to verify our suspicion and if proven correct, implement the suggested fix.

groetg commented 2 months ago

I have a test running now, where I changed below in file /etc/weewx/bin/user/byows_rpi.py

while not success and attempts < 3: into while not success and attempts < 10:

So it checks no first 10 times instead of 3 times, it runs now 24 hours without a crash, so it looks promising

Changed code: `def read_temp(self): temp_c = -255 attempts = 0

    lines = self.read_temp_raw()

    if lines != None:
        success = self.crc_check(lines)

        while not success and attempts < 10:
            time.sleep(0.2)
            lines = self.read_temp_raw()
            success = self.crc_check(lines)
            attempts += 1`
groetg commented 2 months ago

The problem is gone, I will close this issue, thank you.