thegridelectric / gw-scada-spaceheat-python

GridWorks SCADA for space heating
MIT License
5 stars 2 forks source link

PI's DNS got wedged after physical power failure / restore. #160

Open anschweitzer opened 1 year ago

anschweitzer commented 1 year ago

My pi’s dns got wedged after power loss / restore in Cambridge. Restart PI fixed it. I didn’t try restarting the router. Ideally we can figure out something less drastic, but if we can’t restore comm long enough, we should probably reboot the pi.

anschweitzer commented 1 year ago

I think this is same issue as #155, though that one documents some other issues.

anschweitzer commented 1 year ago

An approach:

  1. We can detect DNS not working by the specific network error, and/or we can issue a ping to some well known server. As a backstop we detect by no mqtt broker connection after a long time.
  2. Do some research into what service needed to be restarted and try to just restart that service.
  3. If all else fails after a long time, reboot the pi.
anschweitzer commented 1 year ago

Probably same issue happened in Freedom on apple: log.zip

One of the errors:

 /home/pi/gw-scada-spaceheat-python/gw_spaceheat/proactor/mqtt.py, line 89, in _client_thread

 self._client.connect(self._client_config.host, port=self._client_config.port)
    File                              
 /home/pi/.local/lib/python3.10/site-packages/paho/mqtt/client.py, line 914, in connect
      return             
 self.reconnect()
    File /home/pi/.local/lib/python3.10/site-packages/paho/mqtt/client.py, line 1044, in       
 reconnect
      sock = self._create_socket_connection()
    File                                                   
 /home/pi/.local/lib/python3.10/site-packages/paho/mqtt/client.py, line 3685, in _create_socket_connection

 return socket.create_connection(addr, timeout=self._connect_timeout, source_address=source)
    File                
 /usr/local/lib/python3.10/socket.py, line 824, in create_connection
      for res in getaddrinfo(host, port, 0, 
 SOCK_STREAM):
    File /usr/local/lib/python3.10/socket.py, line 955, in getaddrinfo
      for res in          
 _socket.getaddrinfo(host, port, family, type, proto, flags):
  socket.gaierror: [Errno -3] Temporary failure in     
 name resolution
anschweitzer commented 1 year ago

We might be able to reproduce this simply by disconnecting ethernet for an hour.

Various flavors of this did not succeed in getting dns to work.

anschweitzer commented 10 months ago

Another work around is to record the IP on each successful DNS connection and use that if DNS appears to fail.