kizniche / Mycodo

An environmental monitoring and regulation system
http://kylegabriel.com/projects/
GNU General Public License v3.0
2.89k stars 488 forks source link

Reboot over SSH hangs #388

Closed zsole2 closed 6 years ago

zsole2 commented 6 years ago

Mycodo Issue Report:

Problem Description

Issuing sudo reboot hangs my PiZero. I did not experienced this before v5.5.x Issuing Restart from the GUI appears to be working. What can be checked for troubleshooting?

Daemon log

2018-01-19 19:48:06,193 - mycodo.daemon - INFO - Received command to terminate daemon
2018-01-19 19:48:06,320 - mycodo.input_7 - INFO - Deactivated in 51.9 ms

===> POWER OFF HERE

2018-01-19 19:55:59,396 - mycodo.daemon - INFO - Mycodo daemon v5.5.9 starting
2018-01-19 19:56:05,421 - mycodo.daemon - INFO - Anonymous statistics enabled
2018-01-19 19:56:06,018 - mycodo.daemon - INFO - Starting rpyc server
2018-01-19 19:56:11,922 - mycodo.output - INFO - Output 1 setup on pin 22 and turned OFF (OFF=HIGH)
2018-01-19 19:56:11,943 - mycodo.output - INFO - Output 2 setup on pin 23 and turned OFF (OFF=HIGH)
2018-01-19 19:56:11,954 - mycodo.output - INFO - Output 3 setup on pin 24 and turned OFF (OFF=HIGH)
2018-01-19 19:56:11,967 - mycodo.output - INFO - Output 4 setup on pin 10 and turned OFF (OFF=HIGH)
2018-01-19 19:56:11,981 - mycodo.output - INFO - Output 5 setup on pin 9 and turned OFF (OFF=HIGH)
2018-01-19 19:56:11,996 - mycodo.output - INFO - Output 6 setup on pin 25 and turned OFF (OFF=HIGH)
2018-01-19 19:56:12,009 - mycodo.output - INFO - Output 7 setup on pin 11 and turned OFF (OFF=HIGH)
2018-01-19 19:56:12,022 - mycodo.output - INFO - Output 8 setup on pin 8 and turned OFF (OFF=HIGH)
2018-01-19 19:56:12,048 - mycodo.output - INFO - Output controller activated in 2129.9 ms
2018-01-19 19:56:14,718 - mycodo.timer_1 - INFO - Activated in 1534.5 ms
2018-01-19 19:56:18,023 - mycodo.timer_2 - INFO - Activated in 2165.5 ms
2018-01-19 19:56:21,227 - mycodo.timer_3 - INFO - Activated in 2002.9 ms
2018-01-19 19:56:21,245 - mycodo.daemon - INFO - All activated Timer controllers started
2018-01-19 19:56:25,235 - mycodo.input_1 - INFO - Activated in 2094.4 ms
2018-01-19 19:56:28,095 - mycodo.input_2 - INFO - Activated in 1690.6 ms
2018-01-19 19:56:30,325 - mycodo.input_3 - INFO - Activated in 1314.4 ms
2018-01-19 19:56:32,546 - mycodo.input_4 - INFO - Activated in 1249.7 ms
2018-01-19 19:56:34,843 - mycodo.input_6 - INFO - Activated in 1285.1 ms
2018-01-19 19:56:38,610 - mycodo.input_7 - INFO - Activated in 2774.2 ms
2018-01-19 19:56:40,948 - mycodo.input_8 - INFO - Activated in 1677.2 ms
2018-01-19 19:56:44,496 - mycodo.input_9 - INFO - Activated in 1910.3 ms
2018-01-19 19:56:44,509 - mycodo.daemon - INFO - All activated Input controllers started
2018-01-19 19:56:44,521 - mycodo.daemon - INFO - All activated Math controllers started
2018-01-19 19:56:51,457 - mycodo.pid_1 - INFO - Activated in 5645.7 ms
2018-01-19 19:56:56,779 - mycodo.pid_2 - INFO - Activated in 4059.9 ms
2018-01-19 19:57:01,002 - mycodo.pid_3 - INFO - Activated in 3699.7 ms
2018-01-19 19:57:01,010 - mycodo.daemon - INFO - All activated PID controllers started
2018-01-19 19:57:18,240 - mycodo.lcd_1 - INFO - Activated in 16751.3 ms
2018-01-19 19:57:18,255 - mycodo.daemon - INFO - All activated LCD controllers started
2018-01-19 19:57:19,868 - mycodo.conditional - INFO - Conditional settings refreshed
2018-01-19 19:57:19,892 - mycodo.daemon - INFO - Mycodo daemon v5.5.9 started in 80.482 seconds
2018-01-19 19:57:19,884 - mycodo.conditional - INFO - Conditional controller activated in 1112.4 ms
2018-01-19 19:57:19,963 - mycodo.daemon - INFO - 42.00 MB RAM in use
2018-01-19 19:57:52,213 - mycodo.pid_1 - ERROR - Failed to read measurement from the influxdb database: Could not connect.
2018-01-19 19:58:45,161 - mycodo.pid_3 - ERROR - Failed to read measurement from the influxdb database: Could not connect.

===> RESTART FROM GUI

2018-01-19 19:58:49,952 - mycodo.daemon - INFO - Received command to terminate daemon
2018-01-19 19:58:50,132 - mycodo.input_4 - INFO - Deactivated in 16.2 ms
2018-01-19 19:58:50,153 - mycodo.input_8 - INFO - Deactivated in 39.4 ms
2018-01-19 19:58:50,180 - mycodo.input_7 - INFO - Deactivated in 63.1 ms
2018-01-19 19:58:50,270 - mycodo.input_2 - INFO - Deactivated in 156.1 ms
2018-01-19 19:58:50,256 - mycodo.input_3 - INFO - Deactivated in 143.5 ms
2018-01-19 19:58:50,232 - mycodo.input_9 - INFO - Deactivated in 118.5 ms
2018-01-19 19:58:50,306 - mycodo.input_1 - INFO - Deactivated in 192.6 ms
2018-01-19 19:58:50,316 - mycodo.input_6 - INFO - Deactivated in 203.0 ms
2018-01-19 19:59:29,583 - mycodo.daemon - INFO - Mycodo daemon v5.5.9 starting
2018-01-19 19:59:31,229 - mycodo.daemon - INFO - Anonymous statistics enabled
2018-01-19 19:59:31,611 - mycodo.daemon - INFO - Starting rpyc server

===> HIT REFRESH IN THE BROWSER, RESTARTED AGAIN

2018-01-19 20:03:44,144 - mycodo.daemon - INFO - Mycodo daemon v5.5.9 starting
2018-01-19 20:03:50,089 - mycodo.daemon - INFO - Anonymous statistics enabled
2018-01-19 20:03:50,686 - mycodo.daemon - INFO - Starting rpyc server
2018-01-19 20:03:55,814 - mycodo.output - INFO - Output 1 setup on pin 22 and turned OFF (OFF=HIGH)
2018-01-19 20:03:55,834 - mycodo.output - INFO - Output 2 setup on pin 23 and turned OFF (OFF=HIGH)
2018-01-19 20:03:55,846 - mycodo.output - INFO - Output 3 setup on pin 24 and turned OFF (OFF=HIGH)
2018-01-19 20:03:55,862 - mycodo.output - INFO - Output 4 setup on pin 10 and turned OFF (OFF=HIGH)
2018-01-19 20:03:55,878 - mycodo.output - INFO - Output 5 setup on pin 9 and turned OFF (OFF=HIGH)
2018-01-19 20:03:55,895 - mycodo.output - INFO - Output 6 setup on pin 25 and turned OFF (OFF=HIGH)
2018-01-19 20:03:55,911 - mycodo.output - INFO - Output 7 setup on pin 11 and turned OFF (OFF=HIGH)
2018-01-19 20:03:55,926 - mycodo.output - INFO - Output 8 setup on pin 8 and turned OFF (OFF=HIGH)
2018-01-19 20:03:55,956 - mycodo.output - INFO - Output controller activated in 1380.1 ms
2018-01-19 20:03:58,138 - mycodo.timer_1 - INFO - Activated in 1196.8 ms
2018-01-19 20:04:01,473 - mycodo.timer_2 - INFO - Activated in 2166.1 ms
2018-01-19 20:04:04,584 - mycodo.timer_3 - INFO - Activated in 1863.7 ms
2018-01-19 20:04:04,597 - mycodo.daemon - INFO - All activated Timer controllers started
2018-01-19 20:04:08,988 - mycodo.input_1 - INFO - Activated in 2721.3 ms
kizniche commented 6 years ago

Does sudo shutdown now -r work? Not sure if that's different from the reboot command.

zsole2 commented 6 years ago

Thanks for the suggestion, I'll test it, but I do not expect it to be different...

zsole2 commented 6 years ago

This thing is sorted out. It appears to be caused by faulty hardware. One relay (that was not even in use) turned ON during shutdown, probably shorting out my handmade connect-all-PCB. Still interesting why it worked from within the GUI. Are the relays are cycled ON-OFF, and is it different during shutdown?

kizniche commented 6 years ago

The shutdown from the UI merely issues the command sudo shutdown now -h (or -r for reboot). Mycodo does turn off all relays when the daemon shuts down, which is what it's instructed to do during a system shutdown/reboot (if it's given enough time to do so). Often, if the system shuts down too quickly, Mycodo is just terminated without having gone through the full daemon shutdown procedure.

zsole2 commented 6 years ago

I have a little more info here, and appears to be not a hardware issue. The original issue seemed as hanging on reboot. Now I should define it better, after experiencing it a few more times. The main symptom is that SSH connection was refused. After once leaving the Pi at this state for a longer time, it rebooted later and SSH access was OK again. So i went after the underlying issue, and it appears that stopping the mycodo daemon takes forever. OK more like 35 mins. It also appears not only reboot, but lately on upgrade, too.

kizniche commented 6 years ago

Can you disable all inputs/functions/etc. and see if Mycodo shuts down in a reasonable amount of time? If it does, can you selectively turn your features back on and narrow down what the offending thread(s) are?

I strive to make all processes interruptible for the sake of a quick shutdown, but sometimes there are processes that aren't designed well and backups can occur that delay a shutdown. I don't often experience them on the more powerful Pis, but I can imagine the Zeros could exacerbate it.

zsole2 commented 6 years ago

OK, I'll try to do this. It will take time, though... This is a scenario where the suggestion to have a 'main switch' would come handy.

kizniche commented 6 years ago

This is a scenario where the suggestion to have a 'main switch' would come handy.

What does this mean?

zsole2 commented 6 years ago

It was suggested in #346, a "deactivate all button"

zsole2 commented 6 years ago

OK, I did some more testing. I turned off all inputs and functions, and rebooted several times, adding a couple of those back at a time. No problem at all, as you could expect based on Murphy's laws. This morning, I did the upgrade to 5.5.16, here is the end of the log:

[2018-01-29 07:15:17] #### Restarting the Mycodo daemon
[2018-01-29 07:15:25] Mycodo previously shut down properly
[2018-01-29 07:16:15] 
[2018-01-29 07:16:15] #### Restarting nginx
[2018-01-29 07:16:25] #### Reloading mycodoflask
[2018-01-29 07:16:27] 
[2018-01-29 07:16:27] #### Connecting to http://localhost (creates Mycodo database if it doesn't exist)
[2018-01-29 07:17:34] #### Successfully connected to http://localhost
[2018-01-29 07:17:34] Done.
[2018-01-29 07:17:34] 
[2018-01-29 07:17:34] Upgrade completed successfully without errors.

I edited crontab for pigpiod, and rebooted, at about 7:25 CET.

I got strange behavior, could not connect through SSH, but I did not got refused... just nothing. Disclaimer: I was on the bus, connecting through OpenVPN, who knows how good the net was there... Then, to my surprise, got three e-mails at 10 minute intervals (7:27, 7:37, 7:47), as expected from the Conditional (#387).

Here is the daemon lg from this time frame:

2018-01-29 07:16:12,217 - mycodo.daemon - INFO - Mycodo daemon v5.5.16 starting
2018-01-29 07:16:14,971 - mycodo.daemon - INFO - Anonymous statistics enabled
2018-01-29 07:16:15,380 - mycodo.daemon - INFO - Starting rpyc server
2018-01-29 07:16:20,912 - mycodo.output - INFO - Output 1 setup on pin 22 and turned OFF (OFF=HIGH)
2018-01-29 07:16:20,927 - mycodo.output - INFO - Output 3 setup on pin 24 and turned OFF (OFF=HIGH)
2018-01-29 07:16:20,931 - mycodo.output - INFO - Output 4 setup on pin 10 and turned OFF (OFF=HIGH)
2018-01-29 07:16:20,946 - mycodo.output - INFO - Output 5 setup on pin 9 and turned OFF (OFF=HIGH)
2018-01-29 07:16:20,950 - mycodo.output - INFO - Output 6 setup on pin 25 and turned OFF (OFF=HIGH)
2018-01-29 07:16:20,954 - mycodo.output - INFO - Output 7 setup on pin 11 and turned OFF (OFF=HIGH)
2018-01-29 07:16:20,968 - mycodo.output - INFO - Output 8 setup on pin 8 and turned OFF (OFF=HIGH)
2018-01-29 07:16:20,985 - mycodo.output - INFO - Output controller activated in 838.4 ms
2018-01-29 07:16:22,706 - mycodo.timer_1 - INFO - Activated in 863.1 ms
2018-01-29 07:16:24,420 - mycodo.timer_2 - INFO - Activated in 1205.7 ms
2018-01-29 07:16:26,099 - mycodo.timer_3 - INFO - Activated in 1188.8 ms
2018-01-29 07:16:26,107 - mycodo.daemon - INFO - All activated Timer controllers started
2018-01-29 07:16:30,126 - mycodo.input_1 - INFO - Activated in 2513.5 ms
2018-01-29 07:16:32,237 - mycodo.input_2 - INFO - Activated in 1063.8 ms
2018-01-29 07:16:35,498 - mycodo.input_3 - INFO - Activated in 1448.2 ms
2018-01-29 07:16:37,802 - mycodo.input_4 - INFO - Activated in 1446.1 ms
2018-01-29 07:16:40,363 - mycodo.input_6 - INFO - Activated in 1746.3 ms
2018-01-29 07:16:40,373 - mycodo.inputs.dht22 - ERROR - DHT22Sensor raised an exception when taking a reading: 'NoneType' object has no attribute 'send'
Traceback (most recent call last):
  File "/var/mycodo-root/mycodo/inputs/dht22.py", line 195, in read
    self._temperature) = self.get_measurement()
  File "/var/mycodo-root/mycodo/inputs/dht22.py", line 161, in get_measurement
    self.measure_sensor()
  File "/var/mycodo-root/mycodo/inputs/dht22.py", line 209, in measure_sensor
    self.close()
  File "/var/mycodo-root/mycodo/inputs/dht22.py", line 375, in close
    self.pi.set_watchdog(self.gpio, 0)
  File "/usr/local/lib/python3.5/dist-packages/pigpio.py", line 1764, in set_watchdog
    self.sl, _PI_CMD_WDOG, user_gpio, int(wdog_timeout)))
  File "/usr/local/lib/python3.5/dist-packages/pigpio.py", line 983, in _pigpio_command
    sl.s.send(struct.pack('IIII', cmd, p1, p2, 0))
AttributeError: 'NoneType' object has no attribute 'send'
2018-01-29 07:16:42,666 - mycodo.input_7 - INFO - Activated in 1572.9 ms
2018-01-29 07:16:45,736 - mycodo.input_8 - INFO - Activated in 2393.0 ms
2018-01-29 07:16:48,343 - mycodo.input_9 - INFO - Activated in 1537.1 ms
2018-01-29 07:16:48,361 - mycodo.daemon - INFO - All activated Input controllers started
2018-01-29 07:16:48,363 - mycodo.daemon - INFO - All activated Math controllers started
2018-01-29 07:16:52,071 - mycodo.pid_1 - INFO - Activated in 3060.3 ms
2018-01-29 07:16:56,690 - mycodo.pid_2 - INFO - Activated in 3079.9 ms
2018-01-29 07:17:00,633 - mycodo.pid_3 - INFO - Activated in 3441.0 ms
2018-01-29 07:17:00,643 - mycodo.daemon - INFO - All activated PID controllers started
2018-01-29 07:17:14,271 - mycodo.lcd_1 - INFO - Activated in 13249.0 ms
2018-01-29 07:17:14,283 - mycodo.daemon - INFO - All activated LCD controllers started
2018-01-29 07:17:15,621 - mycodo.conditional - INFO - Conditional settings refreshed
2018-01-29 07:17:15,633 - mycodo.daemon - INFO - Mycodo daemon v5.5.16 started in 63.412 seconds
2018-01-29 07:17:15,624 - mycodo.conditional - INFO - Conditional controller activated in 837.9 ms
2018-01-29 07:17:15,695 - mycodo.daemon - INFO - 41.84 MB RAM in use
2018-01-29 07:18:41,855 - mycodo.pid_3 - WARNING - No data returned from influxdb
2018-01-29 07:20:21,867 - mycodo.pid_3 - WARNING - No data returned from influxdb
2018-01-29 07:21:56,885 - mycodo.pid_2 - WARNING - No data returned from influxdb
2018-01-29 07:22:01,902 - mycodo.pid_3 - WARNING - No data returned from influxdb
2018-01-29 07:23:41,829 - mycodo.pid_3 - WARNING - No data returned from influxdb
2018-01-29 07:25:08,530 - mycodo.daemon - INFO - Received command to terminate daemon
2018-01-29 07:25:08,649 - mycodo.input_9 - INFO - Deactivated in 1.5 ms
2018-01-29 07:25:08,667 - mycodo.input_3 - INFO - Deactivated in 20.4 ms
2018-01-29 07:25:08,683 - mycodo.input_1 - INFO - Deactivated in 36.7 ms
2018-01-29 07:25:08,696 - mycodo.input_8 - INFO - Deactivated in 48.6 ms
2018-01-29 07:25:08,697 - mycodo.input_4 - INFO - Deactivated in 50.5 ms
2018-01-29 07:25:08,700 - mycodo.input_7 - INFO - Deactivated in 52.5 ms
2018-01-29 07:25:08,721 - mycodo.input_2 - INFO - Deactivated in 73.9 ms
2018-01-29 07:25:09,946 - mycodo.pid_2 - INFO - Deactivated in 1299.0 ms
2018-01-29 07:25:10,557 - mycodo.timer_1 - INFO - Deactivated in 1909.3 ms
2018-01-29 07:25:10,921 - mycodo.timer_3 - INFO - Deactivated in 2273.4 ms
2018-01-29 07:25:11,341 - mycodo.timer_2 - INFO - Deactivated in 2693.1 ms
2018-01-29 07:25:11,389 - mycodo.pid_1 - INFO - Deactivated in 2742.6 ms
2018-01-29 07:25:11,703 - mycodo.pid_3 - INFO - Deactivated in 3056.7 ms
2018-01-29 07:25:11,906 - mycodo.lcd_1 - INFO - Deactivated in 3259.9 ms
2018-01-29 07:25:11,909 - mycodo.daemon - INFO - All LCD controllers stopped
2018-01-29 07:25:11,912 - mycodo.daemon - INFO - All PID controllers stopped
2018-01-29 07:25:11,925 - mycodo.daemon - INFO - All Math controllers stopped

**??????????????**

2018-01-29 07:58:31,371 - mycodo.daemon - INFO - Mycodo daemon v5.5.16 starting
2018-01-29 07:58:37,025 - mycodo.daemon - INFO - Anonymous statistics enabled
2018-01-29 07:58:37,682 - mycodo.daemon - INFO - Starting rpyc server
2018-01-29 07:58:42,924 - mycodo.output - INFO - Output 1 setup on pin 22 and turned OFF (OFF=HIGH)
2018-01-29 07:58:42,939 - mycodo.output - INFO - Output 3 setup on pin 24 and turned OFF (OFF=HIGH)
2018-01-29 07:58:42,952 - mycodo.output - INFO - Output 4 setup on pin 10 and turned OFF (OFF=HIGH)
2018-01-29 07:58:42,965 - mycodo.output - INFO - Output 5 setup on pin 9 and turned OFF (OFF=HIGH)
2018-01-29 07:58:42,978 - mycodo.output - INFO - Output 6 setup on pin 25 and turned OFF (OFF=HIGH)
2018-01-29 07:58:42,991 - mycodo.output - INFO - Output 7 setup on pin 11 and turned OFF (OFF=HIGH)
2018-01-29 07:58:43,006 - mycodo.output - INFO - Output 8 setup on pin 8 and turned OFF (OFF=HIGH)
2018-01-29 07:58:43,029 - mycodo.output - INFO - Output controller activated in 1484.8 ms
2018-01-29 07:58:47,158 - mycodo.timer_1 - INFO - Activated in 2870.4 ms
2018-01-29 07:58:51,625 - mycodo.timer_2 - INFO - Activated in 3160.1 ms
2018-01-29 07:58:55,069 - mycodo.timer_3 - INFO - Activated in 2043.7 ms
2018-01-29 07:58:55,087 - mycodo.daemon - INFO - All activated Timer controllers started
2018-01-29 07:59:00,162 - mycodo.input_1 - INFO - Activated in 3446.8 ms
2018-01-29 07:59:02,555 - mycodo.input_2 - INFO - Activated in 1356.6 ms
2018-01-29 07:59:05,153 - mycodo.input_3 - INFO - Activated in 1660.4 ms
2018-01-29 07:59:07,321 - mycodo.input_4 - INFO - Activated in 1484.0 ms
2018-01-29 07:59:09,644 - mycodo.input_6 - INFO - Activated in 1627.6 ms
2018-01-29 07:59:11,876 - mycodo.input_7 - INFO - Activated in 1538.7 ms
2018-01-29 07:59:13,978 - mycodo.input_8 - INFO - Activated in 1472.2 ms
2018-01-29 07:59:16,466 - mycodo.input_9 - INFO - Activated in 1464.9 ms
2018-01-29 07:59:16,483 - mycodo.daemon - INFO - All activated Input controllers started
2018-01-29 07:59:16,491 - mycodo.daemon - INFO - All activated Math controllers started
2018-01-29 07:59:20,114 - mycodo.pid_1 - INFO - Activated in 2859.7 ms
2018-01-29 07:59:24,281 - mycodo.pid_2 - INFO - Activated in 3680.8 ms
2018-01-29 07:59:27,851 - mycodo.pid_3 - INFO - Activated in 3046.4 ms
2018-01-29 07:59:28,448 - mycodo.daemon - INFO - All activated PID controllers started
2018-01-29 07:59:40,940 - mycodo.lcd_1 - INFO - Activated in 12119.3 ms
2018-01-29 07:59:40,954 - mycodo.daemon - INFO - All activated LCD controllers started
2018-01-29 07:59:42,461 - mycodo.conditional - INFO - Conditional settings refreshed
2018-01-29 07:59:42,472 - mycodo.conditional - INFO - Conditional controller activated in 1011.2 ms
2018-01-29 07:59:42,480 - mycodo.daemon - INFO - Mycodo daemon v5.5.16 started in 71.097 seconds
2018-01-29 07:59:42,512 - mycodo.daemon - INFO - 41.35 MB RAM in use
kizniche commented 6 years ago

So apart from the DHT22 error (caused by pigpiod not running, most likely), everything looks good?

I'm still trying to figure out why pigpiod sometimes doesn't start. I may have to make a script to watch if it's alive and restart it if not, like the mycodo daemon.

zsole2 commented 6 years ago

Except that it took more than 30 mins to reboot... the OS shuts down sshd well before mycodo gets shut down, resulting in the inaccessible system. I have no problem pigpiod restarting, since my edit include removing the '-s 1' parameter.

kizniche commented 6 years ago

I have been trying to find out why, but I discovered if gpiod isn't running or stops, the DHT22 module will do something to make it unable to shut down, causing the daemon to hang during shutdown. I'm going to focus on stabilizing the daemon shutdown process.

zsole2 commented 6 years ago

That might explain the situation appearing recently, since I used HTU21D sensors for more than a year now, and just added the DHT22 into the test system...

kizniche commented 6 years ago

I think I found the issue is related to pigpiod not starting up again after an upgrade. So, I've moved pigpiod from cron to a systemd service that can reliably be started from the upgrade script. I'll be able to do a live test during the next upgrade.

kizniche commented 6 years ago

I found that I can use this to check if pigpiod can be connected to:

import pigpiod
pi = pigpio.pi()
if not pi.connected:  # Check if pigpiod is running
    logger.error("Could not connect to pigpiod."
                 "Ensure it is running and try again.")

So, I added these checks to the DHT11/22, HTU21D, and Signal PWM/RPM Inputs.

zsole2 commented 6 years ago

I have not experienced this problem for a while now, so let's close this. Thanks for your efforts, again.