Open divenal opened 2 months ago
Right ok, so that's news. Definitely need to sort that one out. Let me see if I can recreate similar here.
I've been pondering more generally... should the i/o "threads" try to reconnect if there's a disconnect, or should that just get reported back to the application code. One argument is that since the application does an explicit connect(), it should be informed about socket problems, and it's the application's problem to re-establish the connection. But you could also say that the connect() call is just telling the client that you're ready to go into a connected state, and would like the i/o threads to maintain that for you.
There's always an issue that if there's an underlying network problem (eg router has got turned off), you need some mechanism to throttle how often you try to connect. The application really ought to be aware if it's disconnected from the inverter for any significant period of time. (Which might support that the argument that the application should be made responsible for reconnecting as required.)
Doh! Not being the brightest, I managed to do exactly the same thing again. But because I'm now logging to a file, I have a little more info...
2024-04-23 21:38:57: 0 0.0 33.0 1131 1131.1 1140.3 859 1269 1269.6 11 1 461
2024-04-23 21:43:58: 0 0.0 31.4 1131 1131.0 1139.8 858 1268 1268.4 11 1 462
2024-04-23 21:48:58: 0 0.0 29.8 1126 1127.3 1139.2 990 1266 1266.6 11 1 463
2024-04-23 21:53:58: 0 0.0 28.3 1122 1123.3 1138.4 968 1265 1265.4 11 1 464
2024-04-23 21:58:58: 0 0.0 26.9 1124 1123.8 1137.7 981 1263 1263.6 11 1 465
2024-04-23 22:03:59: 0 0.0 25.6 1124 1124.0 1137.0 981 1263 1263.1 11 1 466
2024-04-23 22:09:00: 0 0.0 24.3 1124 1124.0 1136.3 981 1263 1263.0 11 1 467
...
2024-04-24 00:02:35: 0 0.0 7.9 1124 1124.0 1128.0 981 1263 1263.0 11 1 489
2024-04-24 00:07:56: 0 0.0 7.5 1124 1124.0 1127.8 981 1263 1263.0 11 1 490
The numbers aren't too important - 3 columns of solar, 3 columns of AC generation, then export and battery. I was doing a timed discharge at around 1200W until 22:30, so AC generation was around 1130 and export was around 900W. I probably turned the ethernet off around 10pm, and the lines after 22:03:59 have stopped changing - the updates had presumably stopped arriving. But it continued to log the stale numbers until 00:07:56. I'm guessing this is the point that the send task jammed up, and so the application thread stopped because it blocks waiting for the requests to actually get sent out.
Then ^C on the script this morning, I get similar backtraces:
The application task was waiting for the refresh request to get sent out:
File "/tmp/./monitor3.py", line 58, in monitor
await client.refresh_plant(False, max_batteries = 0)
File "/extra/drd/venv/testge2/lib/python3.11/site-packages/givenergy_modbus/client/client.py", line 145, in refresh_plant
^C File "/extra/drd/venv/testge2/lib/python3.11/site-packages/givenergy_modbus/client/client.py", line 309, in send_request_and_await_response
await self.tx_queue.put((raw_frame, frame_sent))
File "/usr/local/lib/python3.11/asyncio/queues.py", line 120, in put
await putter
asyncio.exceptions.CancelledError
I'm guessing the network producer was stuck:
future: <Task finished name='network_producer' coro=<Client._task_network_producer() done, defined at /extra/drd/venv/testge2/lib/python3.11/site-packages/givenergy_modbus/client/client.py:231> exception=TimeoutError(110, 'Connection timed out')>
Traceback (most recent call last):
File "/extra/drd/venv/testge2/lib/python3.11/site-packages/givenergy_modbus/client/client.py", line 236, in _task_network_producer
await self.writer.drain()
And the consumer timed out as before:
Task exception was never retrieved
future: <Task finished name='network_consumer' coro=<Client._task_network_consumer() done, defined at /extra/drd/venv/testge2/lib/python3.11/site-packages/givenergy_modbus/client/client.py:177> exception=TimeoutError(110, 'Connection timed out')>
Traceback (most recent call last):
File "/extra/drd/venv/testge2/lib/python3.11/site-packages/givenergy_modbus/client/client.py", line 236, in _task_network_producer
await self.writer.drain()
File "/usr/local/lib/python3.11/asyncio/streams.py", line 380, in drain
raise exc
File "/extra/drd/venv/testge2/lib/python3.11/site-packages/givenergy_modbus/client/client.py", line 189, in _task_network_consumer
frame = await self.reader.read(300)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/streams.py", line 711, in read
await self._wait_for_data('read')
File "/usr/local/lib/python3.11/asyncio/streams.py", line 543, in _wait_for_data
await self._waiter
File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 999, in _read_ready__data_received
data = self._sock.recv(self.max_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: [Errno 110] Connection timed out
Task exception was never retrieved
I'm actually wondering if producer task is really needed. The requests get added to the write queue, but then the task is blocked until the producer actually writes it out. So arguably the application task may as well send it itself. The producer adds two effects - enforces a delay between writes (which wouldn't be hard to implement a different way), and it sends out the heartbeat reply (which is requested by the reader thread without blocking waiting for it to get sent). Slightly harder to avoid that, since we want to avoid blocking the reader thread. But it could just write it to the socket without doing an explicit drain() - then it can just sit in the i/o buffers.
A related issue... having just discovered the simple http server on the dongle, I see I can reset the dongle from there. And similar sort of issue - the active givenergy_modbus connection didn't immediately appear to notice. I got a couple of read timeouts. so I interrupted, and got
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/tmp/./monitor4.py", line 187, in monitor
await asyncio.sleep(delay)
File "/usr/local/lib/python3.11/asyncio/tasks.py", line 649, in sleep
return await future
^^^^^^^^^^^^
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/tmp/./monitor4.py", line 204, in <module>
asyncio.run(monitor())
File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/runners.py", line 123, in run
raise KeyboardInterrupt()
KeyboardInterrupt
Task exception was never retrieved
future: <Task finished name='network_consumer' coro=<Client._task_network_consumer() done, defined at /extra/drd/venv/testge2/lib/python3.11/site-packages/givenergy_modbus/client/client.py:177> exception=ConnectionResetError(104, 'Connection reset by peer')>
Traceback (most recent call last):
File "/extra/drd/venv/testge2/lib/python3.11/site-packages/givenergy_modbus/client/client.py", line 244, in _task_network_producer
await self.close()
File "/extra/drd/venv/testge2/lib/python3.11/site-packages/givenergy_modbus/client/client.py", line 113, in close
await self.writer.wait_closed()
File "/usr/local/lib/python3.11/asyncio/streams.py", line 364, in wait_closed
await self._protocol._get_close_waiter(self)
File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 999, in _read_ready__data_received
data = self._sock.recv(self.max_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer
Task exception was never retrieved
future: <Task finished name='network_producer' coro=<Client._task_network_producer() done, defined at /extra/drd/venv/testge2/lib/python3.11/site-packages/givenergy_modbus/client/client.py:231> exception=ConnectionResetError(104, 'Connection reset by peer')>
Traceback (most recent call last):
File "/extra/drd/venv/testge2/lib/python3.11/site-packages/givenergy_modbus/client/client.py", line 244, in _task_network_producer
await self.close()
File "/extra/drd/venv/testge2/lib/python3.11/site-packages/givenergy_modbus/client/client.py", line 113, in close
await self.writer.wait_closed()
File "/usr/local/lib/python3.11/asyncio/streams.py", line 364, in wait_closed
await self._protocol._get_close_waiter(self)
File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 999, in _read_ready__data_received
data = self._sock.recv(self.max_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer
(But other than these issues, I've been quite impressed by the stability of the connection - script just runs and runs most of the time.)
We may be doing the async i/o slightly wrong, because when there's a socket error, it seems to just hang, rather than resume. Background: my rpi has both wifi and ethernet plugged int a powerline adapter. The latter has higher priority when it's on, but we usually have it turned off.
When I started my long-running script, the ethernet happened to be on, so it used that interface. But later it got turned off. The script appears to have frozen - probably turned it off around 10pm, but at 8am next morning, it was still jammed. When I hit ^c to interupt it, it revealed the pending exception. The full traceback is below (I think I had to press ^c a few times to actually get it to exit), but the salient but it seems to be:
so it seems that when exceptions ocurr, the coroutine just seems to sit there? I guess the network writer task was blocked in drain(), trying to send the data, and my "foreground" task, doing a refresh_plant() didn't timeout. (It can timeout waiting for response, but perhaps it waits forever for the send ? I'll need to check.)
I'll do some investigation.