Closed DarkwellZ closed 3 years ago
Are you loading the integration from custom_components
?
2020-09-25 20:35:03 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for dsmr which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.
Are you loading the integration from
custom_components
?
2020-09-25 20:35:03 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for dsmr which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.
dsmr documentation dsmr source (message by IssueLinks)
No I am running core 0.115.3 Just to report the issue I ran it in custom_components to get the log.
Here is what I did to get the log and report it: I downloaded the dsmr from : https://github.com/home-assistant/core put the files in: config\custom_components\dsmr\
the only modification I made was to sensor.py on line 70
logging.getLogger("dsmr_parser").setLevel(logging.DEBUG)
You said the issue existed in 0.114, you do have the 0.115.3 version of dsmr in custom integrations?
We added the OSError
in 0.115.3 to the try
except
clause in the connect/reconnect function (you can check that in sensor.py
).
If this doesn't work for you we have to figure out what exception it's throwing here.
I followed up on that issue here, and waited to test with 0.115.3 core. As the problem still occurred, I decided to open my issue.
I think the answer to your question is yes
But let me give you some more info so you can see if I interpreted the questions right:
This morning I reloaded my DSMR integration because it froze over night (running from core 0.115.3)
Yesterday I reported the issue and logged it (running from custom, but only to log, files were based on 0.115.3)
A few weeks ago I got my P1 reader and have been running it on core versions of: 0.114 (not sure what release I was on) 0.115 (all versions)
Can you try to change following in sensor.py
lines 184-200? If it doesn't fail at least we know the logic works, we just have to find out what exception you are getting. Otherwise we have to dive deeper.
From:
except (serial.serialutil.SerialException, OSError):
# Log any error while establishing connection and drop to retry
# connection wait
_LOGGER.exception("Error connecting to DSMR")
transport = None
protocol = None
except CancelledError:
if stop_listener:
stop_listener()
if transport:
transport.close()
if protocol:
await protocol.wait_closed()
return
To:
except CancelledError:
if stop_listener:
stop_listener()
if transport:
transport.close()
if protocol:
await protocol.wait_closed()
return
except:
# Log any error while establishing connection and drop to retry
# connection wait
_LOGGER.exception("Error connecting to DSMR")
transport = None
protocol = None
OK, making the change in notepad++ caused loads of indentation errors. Its running now. Wait for it to freeze and get the log then?
Here is the start-up log:
2020-09-26 10:08:39 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for dsmr which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.
2020-09-26 10:08:39 WARNING (MainThread) [homeassistant.components.http] The 'base_url' option is deprecated, please remove it from your configuration 2020-09-26 10:08:40 WARNING (MainThread) [slixmpp.stringprep] Using slower stringprep, consider compiling the faster cython/libidn one. 2020-09-26 10:08:43 DEBUG (MainThread) [dsmr_parser.clients.protocol] connected 2020-09-26 10:08:44 DEBUG (MainThread) [dsmr_parser.clients.protocol] received data: /ISK5\2M550T-1012
1-3:0.2.8(50) 0-0:1.0.0(200926100844S) 0-0:96.1.1(4530303 2020-09-26 10:08:44 DEBUG (MainThread) [dsmr_parser.clients.protocol] received data: 434303036383738333337373137) 1-0:1.8.1(005081.035kWh) 1-0:1.8.2(006355.660kWh) 1-0:2.8.1(000000.000kWh) 1-0:2.8.2(000000.000kWh) 0-0:96.14.0(0001) 1-0:1.7.0(00.288kW) 1-0:2.7.0(00.000kW) 0-0:96.7.21(00005) 0-0:96.7.9(00004) 1-0:99.97.0 2020-09-26 10:08:44 DEBUG (MainThread) [dsmr_parser.clients.protocol] received data: (2)(0-0:96.7.19)(180927121817S)(0000000377s)(191204124131W)(0000014776s) 1-0:32.32.0(00003) 1-0:52.32.0(00003) 1-0:72.32.0(00004) 1-0:32.36.0(00001) 1-0:52.36.0(00001)
2020-09-26 10:08:44 DEBUG (MainThread) [dsmr_parser.clients.protocol] received data: 1-0:72.36.0(00001) 0-0:96.13.0() 1-0:32.7.0(232.9V) 1-0:52.7.0(232.4V) 1-0:72.7.0(232.9V) 1-0:31.7.0(000A) 1-0:51.7.0(001A) 1-0:71.7.0(000A) 1-0:21.7.0(00.000kW) 1-0:41.7.0(00.286kW) 1-0:61.7.0(00.000kW) 1-0:22.7.0(00.000kW) 1-0:42.7.0(00.000kW) 1-0:62.7.0(00.000kW) 0-2:24.1.0(003) 0-2:96.1.0(4730303332353635353538383330313138) 0-2:24.2.1(200926100504S)(02175.140*m3) !E8C8
2020-09-26 10:08:44 DEBUG (MainThread) [dsmr_parser.clients.protocol] got telegram: /ISK5\2M550T-1012
1-3:0.2.8(50) 0-0:1.0.0(200926100844S) 0-0:96.1.1(4530303434303036383738333337373137) 1-0:1.8.1(005081.035kWh) 1-0:1.8.2(006355.660kWh) 1-0:2.8.1(000000.000kWh) 1-0:2.8.2(000000.000kWh) 0-0:96.14.0(0001) 1-0:1.7.0(00.288kW) 1-0:2.7.0(00.000kW) 0-0:96.7.21(00005) 0-0:96.7.9(00004) 1-0:99.97.0(2)(0-0:96.7.19)(180927121817S)(0000000377s)(191204124131W)(0000014776s) 1-0:32.32.0(00003) 1-0:52.32.0(00003) 1-0:72.32.0(00004) 1-0:32.36.0(00001) 1-0:52.36.0(00001) 1-0:72.36.0(00001) 0-0:96.13.0() 1-0:32.7.0(232.9V) 1-0:52.7.0(232.4V) 1-0:72.7.0(232.9V) 1-0:31.7.0(000A) 1-0:51.7.0(001A) 1-0:71.7.0(000A) 1-0:21.7.0(00.000kW) 1-0:41.7.0(00.286kW) 1-0:61.7.0(00.000kW) 1-0:22.7.0(00.000kW) 1-0:42.7.0(00.000kW) 1-0:62.7.0(00.000kW) 0-2:24.1.0(003) 0-2:96.1.0(4730303332353635353538383330313138) 0-2:24.2.1(200926100504S)(02175.140*m3) !E8C8
Basically yes, let it run and see if it fails.
You can add the following (instead of except:
):
except Exception as ex:
_LOGGER.error(f"Exception of type {type(ex).__name__} occurred")
This will log the type of exception that occurs when it happens.
So it stopped at 13:14 nothing special to see in the logs around that time frame.
2020-09-26 13:14:43 DEBUG (MainThread) [dsmr_parser.clients.protocol] received data: /ISK5\2M550T-1012
1-3:0.2.8(50) 0-0:1.0.0(200926131444S) 0-0:96.1.1(4530303
There was a message at 12:55
Invalid telegram. The CRC checksum '18245' does not match the expected '11416' 12:55:36 – /usr/local/lib/python3.8/site-packages/dsmr_parser/clients/protocol.py (WARNING)
Logger: dsmr_parser.clients.protocol Source: /usr/local/lib/python3.8/site-packages/dsmr_parser/clients/protocol.py:109 First occurred: 12:55:36 (1 occurrences) Last logged: 12:55:36
Invalid telegram. The CRC checksum '18245' does not match the expected '11416'
I had added the code from above before so I had this:
except Exception as ex:
_LOGGER.exception("Error connecting to DSMR")
_LOGGER.error(f"Exception of type {type(ex).__name__} occurred")
# Log any error while establishing connection and drop to retry
# connection wait
transport = None
protocol = None
the f in the code seems odd, which one is the right one?
_LOGGER.error(f"Exception of type {type(ex).__name__} occurred")
_LOGGER.error("Exception of type {type(ex).__name__} occurred")
The f is correct, it's an f-string to insert the type. But as you had it in, and nothing was logged, it fails in another way that apparantly goes undetected by the connect-reconnect...
What is your meter type and device you use to connect to your meter?
I have an ISKRA AM550:
AM550-TD02.01 SMR5.0
As for the reader: I think its a WEMOS D1 mini ESP8266 for WiFi connection. Connected to a self made (not by me) print. It is powered though the p1 port from my ISKRA.
The whole thing is running esp-link v2.2.3 to send the telegram over telnet. (I've asked the one that made the print to look at this question, he can probably give more detail on what it is exactly)
I suspect your hardware has stability issues. Could be because of power supply instabilies or maybe the WiFi connection.
Still the question why the integration does not pick up the what seems to be a connection error. Will have to dive in deeper.
I think it's my wifi connection (more devices have a drop of connection every now and then).
The root cause for that is found in this line:
2020-09-26 13:31:05 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for ziggonext which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.
In other words: I blame my Ziggo router.
I've added a external power source to my P1 reader to see if that helps.
Indeed would be great if the integration could pick up on the signal loss and reconnect.
@RobBie1221 he is using my P1 module. This is a Wemos D1 mini with an add-on which is nothing more than a logic inverter and power stabiliser. On the Wemos is esp-link installed which sends the telegrams through telnet.
These issues doesn't occur when using dsmr-reader (v4.1 and higher). There is the option to connect via web socket and works flawless. Even when the connection drops, it reconnects within a few seconds when the connection is re-established.
In Home Assistant, when the connections is dropped, it won't reconnect. The Wemos/esp is sending telegrams, you can see them in the uConsole of esp-link and if you telnet to the Wemos/esp. When restarting Home Assistant, it starts work again.
Now I'm not able to test it myself as I'm using an ethernet module and don't want gaps in my dsmr-reader logging.
The modules are described on my website: https://www.zuidwijk.com/p1-modules/
If you want, I can send you a wifi p1 module so you can test it yourself if you want.
@zuidwijk Thanks for the information. I'm not sure if it helps to send one to me, but I'll think about it.
What seems to happen is that the connection fails and the data flow to Home Assistant stops, but the connection does not explicitly fail. I know from sockets this can happen, you can even pull an ethernet cable and it may take minutes before a socket sees that happened.
dsmr-reader seems to have a watchdog for receiving telegrams. If no telegram is received for a certain amount of time, it simply throws an error and forces reconnection. That is not being done neither in Home Assistant nor in the library used by Home Assistant (ndokter/dsmr_parser). Ideally there is a detection or shorter timeout on broken connection but I'm not sure if that's possible.
Could a logic work that checks the timestamp of the last telegram? Sort of as an internal time-out?
I do a similar thing to let me know it has frozen again.
I have an automation that detects the sensor being stale and notify me on my cellphone.
{{(as_timestamp(now()) - as_timestamp(states.sensor.power_consumption.last_updated)) | float | round(4) > 0.1}}
I've tried as a workaround have this automation reload the integration but cant get that to work. Is there a way for me to request a reconnect though home assistant? Either from a python script or ask home assistant to reload the integration?
@zuidwijk Thanks for the information. I'm not sure if it helps to send one to me, but I'll think about it.
What seems to happen is that the connection fails and the data flow to Home Assistant stops, but the connection does not explicitly fail. I know from sockets this can happen, you can even pull an ethernet cable and it may take minutes before a socket sees that happened.
dsmr-reader seems to have a watchdog for receiving telegrams. If no telegram is received for a certain amount of time, it simply throws an error and forces reconnection. That is not being done neither in Home Assistant nor in the library used by Home Assistant (ndokter/dsmr_parser). Ideally there is a detection or shorter timeout on broken connection but I'm not sure if that's possible.
Just let me know if you want/need one ;-) I'd like to contribute my way (this way) to the platform.
Hey I just wanted to drop by and report my "solution", which is actually a workaround, that seems to be working:
So I've made the following addition to my configuration.yaml:
rest_command:
reload_dsmr:
url: !secret dsmr_reload_url
method: POST
headers:
authorization: !secret rest_token
For this I've set up the following parts in my secrets.yaml This requires 1: navigate to \homeassistant.local\config.storage\core.config_entries and grab the entry_id for the DSMR integration 2: a long term token for the RESTfull command (set it up in your profile using the ui)
insert the values from 1 and 2 in the below:
dsmr_reload_url: 'https://myfullhomeassistanturl.duckdns.org:8123/api/config/config_entries/entry/insert_config_id_of_dsmr_integration/reload'
rest_token: Bearer insert_a_long_term_token_here
Last I made a automation that is run every minute, it checks if the sensor has not been updated and if so, restarts the integration. As I have a DSMR version 5, it normally sends updates every second. So no data for 30 seconds means its dead.
- id: ''
alias: Reconnect the DSMR
description: ''
trigger:
- platform: time_pattern
minutes: /1
condition:
- condition: template
value_template: '{{ relative_time(states.sensor.power_consumption.last_updated
| timestamp_local) > ''30'' }}'
action:
- service: rest_command.reload_dsmr
data: {}
It's kind of ugly, but indeed that's the only way to reload.
I'm quite convinced the DSMR reader disconnects without a close
(happens when e.g. network connection fails). In this case, the connection on HA is essentially blocking for a very long time. On a socket a timeout can be set, which would be the nicest solution. In the integration however it doesn't directly use sockets but create_connection
from asyncio. I have to check whether lowlevel options can be set.
Btw, I guess the external power supply didn't work?
It's not pretty, but I'm pretty happy. Before it was sending me a notification and I was manually reloading it 20 times a day. Trust me, that gets boring really quickly.
No the external power made no difference. I still think its my WiFi connection / router that triggers the problem in the first place.
I'm back to the core integration so if a solution ever becomes available...
Thanks for you effort so far, and thank you @zuidwijk for your assistance as well.
I have a similar issue. It was introduced somewhere when I upgraded to Home Assistant 0.115.x and DSMR Reader version 4.4.3 (coming from 3.x). When I restart the MQTT process in DSMR Reader it works again for a while, but after time the sensors are not updated anymore in Home Assistant. Also when I turn on "Keep reconnecting" setting it still occurs.
I am trying now with setting Quality of Service = QoS1.
Maybe this also relates to this Issue
@groenmarsmannetje I think you are mixing up 2 integrations, this is related to dsmr
integration, I suspect your issue is related to dsmr reader
integration
@groenmarsmannetje I do think, like @RobBie1221 said, you’re facing something else then described in this main issue.
Now I’m running 0.114.4 and dsmr-reader 4.4.3 and having none of those issues you’re describing. Now I’m running docker on Linux with every container self build and not using HassIO-ish setup. Perhaps it’s a mqtt thing? You’re saying restarting mqtt service within dsmr-reader solves it. Perhaps you should ask support from dsmr-reader?
Sorry for the confusion. I suggest to update the problem description in first message, because in there it mentions ‘DSMR Reader’ as integration.
Sorry for the confusion. I suggest to update the problem description in first message, because in there it mentions ‘DSMR Reader’ as integration.
Fixed that.
It's kind of ugly, but indeed that's the only way to reload.
I'm quite convinced the DSMR reader disconnects without a
close
(happens when e.g. network connection fails). In this case, the connection on HA is essentially blocking for a very long time. On a socket a timeout can be set, which would be the nicest solution. In the integration however it doesn't directly use sockets butcreate_connection
from asyncio. I have to check whether lowlevel options can be set.
I've found I'm running into this issue as well. I blame wifi, my AP reports a signal strength of -85DB while being 2m away from it. (device itself is reporting <50DB from the AP).
@RobBie1221 Did you get the chance to look into the low-level stuff? anything I can help with?
I tried something with a Pi with Ser2net but that setup seems very robust. Pulling a cable doesn't really bother the integration and plugging it in again will get it working again. I suspect the Ser2net keeps te socket alive and floods the send buffer on his side. When plugging in again, it seems to just send again without even reconnecting.
Basically, I want to find out what can be detected by the low level socket. Nicest solution would be if om that level it is visible that something happens. A timer like solution to detect entities are not updating anymore should be a last resort, and I'm not even sure that is accepted from Home Assistant side or if that's something that needs to be done by the dsmr library.
I do have a python dsmr emulator now which I can kill easier and harder then the Ser2net. I was planning to give that a go and see what that brings me.
@mjkl-gh Are you using the same hardware as @DarkwellZ
@RobBie1221 If you want I can send you a WiFi P1 reader (the same as @DarkwellZ has: logic inverter with Wemos and esp-link).
@zuidwijk Thanks (again) for the offer :-) I preferred investigating other options because this unit would require me to uncouple my own P1 cable, which I rather not do.
It's also not necessary. I'm able to reproduce the issue. I have Home Assistant running in a Ubuntu VM and a small daemon on the Windows host. What does work is if I kill the daemon, that is detected by the integration. What triggers the hangup is if I cut the network to the VM (which kind of would be similar to a WiFi hickup). In this case, the disconnect is not detected (I've waited about 30 minutes after which it still does not work) and updates are stopped until a manual reset is triggered.
@RobBie1221 I understand, got the same issue. I got a wired (Ethernet) p1 reader which prevents me from long term testing. I am designing a active splitter to connect two reader on one meter, yet that's a long term project 😅
My Ethernet P1 reader:
@mjkl-gh Are you using the same hardware as @DarkwellZ
I'm using https://gist.github.com/oxan/4a1a36e12ebed13d31d7ed136b104959 on m5stickc
I'm guessing the code might be dropping clients too agressively and not reconnecting them when they get back. Is that making sense?
The problem is that Home Assistant runs a "connect and reconnect" loop. It only reconnects when it detects a disconnect. In the case there is a network failure, it fails to detect this as such and therefore "hangs". The existing connection is broken and it does not attempt to make a new connection. I'm trying to find out why it doesn't detect that there is a network issue and how I can influence the behavior.
I got as much from everything above. However, i'm looking into this gist at the same time, since i think something might not be right.
A: I think i read somewhere the whole point of the AsyncTCP library on esp was that it cleans up after itself. This gist however, includes a cleanup() in which it is actively cleaning disconnect clients. And disconnected gets set pretty easily. B: I've just opened a netcat session while logging through esphome. As I suspected i saw "192.168.x.x disconnected" right around the time it stopped receiving.
My guess is that the ESP is actively dropping the connection without telling the client. Increasing the occurence of issue we're seeiing here.
to further elaborate: the netcat session never notices a disconnect. It just stops receiving anything and hangs there. This sounds like the issue you are describing.
If a client disconnects without a close()
that doesn't help. That will sort of trigger the same thing we see here, so if your device is doing that, that will definitely increase the chances of this occurring.
Nevertheless, it should be possible to make this robust, but the challenge is that the socket handling is wrapped by asyncio. I tried to manually make a socket with timeout on recv
, but this doesn't do anything. It seems that once a network connection hangs, the recv
is not even called anymore.
First of all, as i said i'm going to also do my best to to fix the close() part. But if it is desireable, I want to help in the effort to fix this part.
If I read it correctly the connect and reconnect awaits the creation of the reader.
try:
transport, protocol = await hass.loop.create_task(reader_factory())
Looking at the code that comes after it I'm assuming connect_and_reconnect() only gets resumed when the reader exits (error disconnect etc) Because everything else in there only handles shutdowns and reconnection.
Correct me if I'm wrong but i'm assuming the create_tcp_dsmr_reader() call never exits when closed() isn't called by the server.
what I don't get is the "transport, protocol =" part. loop.create_task() only returns a task object.
And when i look at this create_tcp_dsmr_reader() only returns conn. which i'm assuming should be the transport object. create_tcp_dsmr_reader() does have a protocol object but doesn't return it.
Am I missing some python stuff or shouldn't that be possible?
The conn
is a transport, protocol pair (https://docs.python.org/3/library/asyncio-eventloop.html#opening-network-connections)
The issue is that it's only reconnecting when this function is called:
https://github.com/ndokter/dsmr_parser/blob/0427ace079b2c691904cca2ae4425f7777623ae1/dsmr_parser/clients/protocol.py#L95-L101
The connection_lost
is an override of the function defined in class asyncio.Protocol
. The asyncio however does not detect an "ungraceful" disconnect as such and does not call this function. The main question is why...if we can find that out, perhaps we can do something about it.
Maybe because we are only receiving packets and not sending any? this suggests asyncio requires packets being sent to notice ungraceful disconnects.
The following is stated there:
The only reliable way to detect disconnection is waiting for a message from peer (it can be ping message or regular data packet) with a timeout.
That's similar to what I said. I tried to wait for a message with a timeout (now it waits without timeout). This however also doesn't work, because the socket recv
is not called anymore when network error occurs. As said before, we need to understand why it's not called anymore.
Is there something I can help with in this process?, until Saturday I'm working on something else, but after that I would be able to try something.
First plan for me would be to install dmsr_parser through pip and see if it's possible to make it work.
Would you maybe mind sharing what you already tried with recv
? I would be interested in that.
Well, I do not have a clue what is the root cause let alone a solution, so any help is appreciated.
I can take you through what I've found. As mentioned above, connection_lost
is not called, which is why it does not reconnect.
Initially I thought it may hang in socket recv
. I've traced that down to selector_events.py
from asyncio:
def _read_ready__data_received(self):
if self._conn_lost:
return
try:
data = self._sock.recv(self.max_size)
except (BlockingIOError, InterruptedError):
return
except (SystemExit, KeyboardInterrupt):
raise
except BaseException as exc:
self._fatal_error(exc, "Fatal read error on socket transport")
return
if not data:
self._read_ready__on_eof()
return
try:
self._protocol.data_received(data)
except (SystemExit, KeyboardInterrupt):
raise
except BaseException as exc:
self._fatal_error(exc, "Fatal error: protocol.data_received() call failed.")
I put a breakpoint here on self._sock.recv(self.max_size)
. This function is called every few seconds. When I create a network failure, this function is not called anymore (in fact, it does not enter in this function anymore).
What I tried was to replace here: https://github.com/ndokter/dsmr_parser/blob/0427ace079b2c691904cca2ae4425f7777623ae1/dsmr_parser/clients/protocol.py#L59
To the following:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,int(port))
s.settimeout(10)
conn = loop.create_connection(protocol, sock=s)
This does not change anything, which is expected given above because this would only help if the root cause was recv
that blocks.
Ok, I'm testing it now too. I've installed home assistant on VMware (via the OVA) and only installed the dsmr-reader. Once I reseat the power/p1 from my Ethernet module (can handle 5 connections, 1 for my private and 1 for testing) it stops logging. While my dsmr-reader immediately starts working again.
Ok, I've tested two scenarios:
1) Power loss to the module (same as restarting the Wemos/WiFi module): Home Assistant stops logging and won't reconnect to the module.
2) Unplug and replug the ethernet cable: Home Assistant starts logging again.
Edit: In both scenarios the DSMR-reader from Dennis Siemensma keeps logging.
@RobBie1221 I don't think I have much time on Saturday but I'll try on Sunday to see if I can translate the python code to something I can understand (got a lot of php/c++/perl knowledge so I think I'll manage).
The weird part is, why it does reconnect after ethernet loss and not after power loss (restart of the telnet server).
Maybe if I can generate some logging I can see what's happening.
The weird part is, why it does reconnect after ethernet loss and not after power loss (restart of the telnet server).
If i look at the c++ code in my p1 logger, a reboot reinitializes the array of connected clients. Home assistant never reconnects ,but hangs, so no client is connected.
When reconnecting ethernet you re-establish connection. If the reader still remembers your client the packets will reach home assistant again, resuming the hanging connection. In my case a client gets agressively deleted on any error. Triggering this problem in both cases (restart and ethernet loss)
I have no experience whatsoever with lowlevel python at the moment, butdef _read_ready__data_received(self):
sounds like data is already received. As in, something external triggers this function on getting a signal that there is some data in some buffer somewhere. Maybe this function is not called anymore since no data is entering that buffer?
I think this adds _read_ready()
which is _read_ready_cb() which is _read_ready__data_received(). as callback to when READ_EVENT takes place on our socket.
Maybe since no READ_EVENTS take place recv() isn't called anymore.
I was hoping we could implement something in the way of
Promise1 = receive_telegram() Promise2 = timeout(interval)
Promise.race( Promise1, Promise2)
IF promise2 resetconnection()
However, as asyncio handles our socket continously that would be hard to implement
If we get rid of asyncio solving this is peanuts, because we can block a thread on recv
and make sure socket timeout is set. The recv
will always either give data or give TimeoutError
, which would be the indication to reconnect. That would however mean serious refactoring of the library and home assistant.
Your theory is right, if both client and server do not cleanup the socket, when disconnecting the cable, the socket send will buffer (and perhaps overflow) but anyway if you plug it in again data starts flowing again.
I had a theory that asyncio uses socket select
function before call recv
to prevent blocking a thread when calling recv
. A select would not see a hard disconnect, it would just return 0 bytes. I'm however not finding yet that it actually calls this function.
I'm not yet giving up on asyncio...
Hi
I've a similar problem and maybe this will help you. The dsmr component worked fine with 0.113 until I updated to 0.116 and now 0.117. The entities are no longer available. I've activity on port 2001 (tcpdump) and after some sec. the connection is closed. socket 2 (fd8) is at EOF exiting with status 0
Now I'm using sensor.py from (May27) which seams to work fine. https://raw.githubusercontent.com/home-assistant/core/cfaa851b5b8ace3d14262c2e8c57a237d5f9d3ec/homeassistant/components/dsmr/sensor.py
My reader is wired.
@humanoidlux I'm not sure this is the same problem. What essentially changed between 0.113 and 0.116 is that it doesn't use platform setup anymore. When importing to a ConfigEntry, it's validating the connection. That means it connects, waits a few seconds and if validation was successful, it disconnects and adds a ConfigEntry. After that, it should reconnect through the normal flow (from sensor.py). When validation was unsuccessful, it will never reconnect, because import failed.
When you past back the old sensor.py, it will use platform setup again without connection validation. I suspect this connection validation does not work for you, question is why. What you could do to verify is to place back the original sensor.py, startup Home Assistant and check Configuration->Integrations and check if there is a card with "DSMR". If not, then import fails and this is another issue then described here. In that case, please make a new issue, then we'll continue debugging there.
The problem
After a random time the DSMR sensors stops updating.
Environment
Home Assistant Core release with the issue:
0.115.3
Last working Home Assistant Core release (if known):
Issue existed in 0.114, didn't have dsmr reader in earlier versions (had to problem since installing it)
Operating environment (OS/Container/Supervised/Core): Hass.io on VMWare
Integration causing this issue:
DSMR
Link to integration documentation on our website:
https://www.home-assistant.io/integrations/dsmr
Problem-relevant
configuration.yaml
Traceback/Error logs
home-assistant - dsmr debug.log
Additional information
Restarting home assistant resolved the issue, as does a integration reload. Problem returns after a certain time, can be 10 seconds, can be 3 hours.
If sensor is fronzen, telegram can be read from the web-ui as well as telnet client.
Losing connection to the DSMR reader seems to be related:
Added my (redacted) logfile. It shows the final message recieved from the DSMR parser. Then I left the next message (Waze sensor) in as it just seems to stop without logging a reason or error.