thegridelectric / gw-scada-spaceheat-python

GridWorks SCADA for space heating
MIT License
5 stars 2 forks source link

Apple Pi went dark #185

Open anschweitzer opened 1 year ago

anschweitzer commented 1 year ago

The "apple" rasperry pi in Freedom Maine went dark on 2023-02-20 at 12:50 GMT. None of its lights were blinking.

Our current favorite theory is that a voltage spike caused this.

Here are logs from that time.

Here's a summary of what we see in the scada logs:

2023-02-20 12:27:03.828    happy
...
2023-02-20 12:28:00.036    happy
...
2023-02-20 12:29:00.099    happy
...
2023-02-20 12:50:06.427    happy

BINARY NONSENSE IN LOG FILE

2023-02-20 12:28:40.912 run_async_actors_main() starting
    self.smbus.write_byte_data(self.address, MCP23008_REG_IODIR, register_value)
  File "/home/pi/.local/lib/python3.10/site-packages/smbus2/smbus2.py", line 455, in write_byte_data
    ioctl(self.fd, I2C_SMBUS, msg)
OSError: [Errno 121] Remote I/O error

... reboots continue ... until last one at: 

2023-02-20 12:28:53.246 run_async_actors_main() starting
2023-02-20 12:28:53.353 ERROR in run_async_actors_main.
  File "/home/pi/gw-scada-spaceheat-python/gw_spaceheat/drivers/base/mcp23008.py", line 78, in set_output_register
    self.smbus.write_byte_data(self.address, MCP23008_REG_IODIR, register_value)
  File "/home/pi/.local/lib/python3.10/site-packages/smbus2/smbus2.py", line 455, in write_byte_data
    ioctl(self.fd, I2C_SMBUS, msg)
OSError: [Errno 121] Remote I/O error

NOW NOTHING FOR 2 DAYS

2023-02-22 14:32:20.581 run_async_actors_main() starting
2023-02-22 14:32:20.701 ERROR in run_async_actors_main. Shutting down: [[Errno 121] Remote I/O error] / [<class 'OSError'>]
    self.smbus.write_byte_data(self.address, MCP23008_REG_IODIR, register_value)
  File "/home/pi/.local/lib/python3.10/site-packages/smbus2/smbus2.py", line 455, in write_byte_data
    ioctl(self.fd, I2C_SMBUS, msg)

And then that repeats until this morning. 

syslog tails a similar but not identical confusing story:

Feb 20 12:17:00 scada is running 
silence for 11 minutes
Feb 20 12:28:24 pi is starting up
scada enters i2c crash loop
Feb 20 12:28:53 last message in syslog for 2 days. 
anschweitzer commented 1 year ago

Some comments from @jessicamillar:

It is likely hardware, with a high probability of voltage spike. The first corrective action was to put in a surge protector. We've done that, and I've ordered a better one that will arrive at Phillip's house tomorrow. When we put Pi's in the field before, they were behind a DC converter that was built into our board (we would wire the device into 240, and it measured the voltage internally and also did the switching internally, so it had high voltage and low voltage sides). That converter was high quality and the most expensive thing in our board. It probably did a better job of maintaining healthy low voltages for the electronics than the cheap off-the-shel bricks we are getting. So we are also going to do some more research and take some more care in selecting higher quality converters. So when we said we've put hundreds of Pis in the field before, it was true, but not with cheap converters.

Also, there were three hardware pieces that became non-functional: both EZ-Flows, and also the little hat that sits on top of the PI for i2c conversion. To me this looks like voltage went out of range (10V?), probably hurt everything, and fried some of it.