DSMR sensor no longer updating

DarkwellZ commented 4 years ago

The problem

After a random time the DSMR sensors stops updating.

Environment

Home Assistant Core release with the issue:
0.115.3
Last working Home Assistant Core release (if known):
Issue existed in 0.114, didn't have dsmr reader in earlier versions (had to problem since installing it)
Operating environment (OS/Container/Supervised/Core): Hass.io on VMWare
Integration causing this issue:
DSMR
Link to integration documentation on our website:
https://www.home-assistant.io/integrations/dsmr

Problem-relevant `configuration.yaml`

  - platform: dsmr
    host: 192.168.178.77
    port: 23
    dsmr_version: 5
    reconnect_interval: 30

recorder:
  exclude:
      entities:
          - sensor.energy_consumption_tarif_1
          - sensor.energy_consumption_tarif_2
          - sensor.energy_production_tarif_1
          - sensor.energy_production_tarif_2
          - sensor.gas_consumption
          - sensor.power_consumption

Traceback/Error logs

home-assistant - dsmr debug.log DSMR frozen

Additional information

Restarting home assistant resolved the issue, as does a integration reload. Problem returns after a certain time, can be 10 seconds, can be 3 hours.

If sensor is fronzen, telegram can be read from the web-ui as well as telnet client.

Losing connection to the DSMR reader seems to be related:

Physically disconnecting and reconnecthing the DSMR reader causes the same issue.
So does a WiFi reset on the DSMR reader.
Perhaps the random lose of connection can be explained by unstable WiFi?

Added my (redacted) logfile. It shows the final message recieved from the DSMR parser. Then I left the next message (Waze sensor) in as it just seems to stop without logging a reason or error.

RobBie1221 commented 4 years ago

Are you loading the integration from custom_components?

2020-09-25 20:35:03 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for dsmr which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.

Jsn2win commented 4 years ago

Are you loading the integration from custom_components?

2020-09-25 20:35:03 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for dsmr which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.

probot-home-assistant[bot] commented 4 years ago

dsmr documentation dsmr source _{^{(message by IssueLinks)}}

DarkwellZ commented 4 years ago

No I am running core 0.115.3 Just to report the issue I ran it in custom_components to get the log.

Here is what I did to get the log and report it: I downloaded the dsmr from : https://github.com/home-assistant/core put the files in: config\custom_components\dsmr\

the only modification I made was to sensor.py on line 70

logging.getLogger("dsmr_parser").setLevel(logging.DEBUG)

RobBie1221 commented 4 years ago

You said the issue existed in 0.114, you do have the 0.115.3 version of dsmr in custom integrations?

We added the OSError in 0.115.3 to the try except clause in the connect/reconnect function (you can check that in sensor.py).

If this doesn't work for you we have to figure out what exception it's throwing here.

DarkwellZ commented 4 years ago

I followed up on that issue here, and waited to test with 0.115.3 core. As the problem still occurred, I decided to open my issue.

I think the answer to your question is yes

But let me give you some more info so you can see if I interpreted the questions right:

This morning I reloaded my DSMR integration because it froze over night (running from core 0.115.3)

Yesterday I reported the issue and logged it (running from custom, but only to log, files were based on 0.115.3)

A few weeks ago I got my P1 reader and have been running it on core versions of: 0.114 (not sure what release I was on) 0.115 (all versions)

RobBie1221 commented 4 years ago

Can you try to change following in sensor.py lines 184-200? If it doesn't fail at least we know the logic works, we just have to find out what exception you are getting. Otherwise we have to dive deeper.

From:

except (serial.serialutil.SerialException, OSError):
    # Log any error while establishing connection and drop to retry
    # connection wait
    _LOGGER.exception("Error connecting to DSMR")
    transport = None
    protocol = None
except CancelledError:
    if stop_listener:
        stop_listener()

    if transport:
        transport.close()

    if protocol:
        await protocol.wait_closed()

    return

To:

except CancelledError:
    if stop_listener:
        stop_listener()

    if transport:
        transport.close()

    if protocol:
        await protocol.wait_closed()

    return
except:
    # Log any error while establishing connection and drop to retry
    # connection wait
    _LOGGER.exception("Error connecting to DSMR")
    transport = None
    protocol = None

DarkwellZ commented 4 years ago

OK, making the change in notepad++ caused loads of indentation errors. Its running now. Wait for it to freeze and get the log then?

Here is the start-up log:

2020-09-26 10:08:39 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for dsmr which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.

2020-09-26 10:08:39 WARNING (MainThread) [homeassistant.components.http] The 'base_url' option is deprecated, please remove it from your configuration 2020-09-26 10:08:40 WARNING (MainThread) [slixmpp.stringprep] Using slower stringprep, consider compiling the faster cython/libidn one. 2020-09-26 10:08:43 DEBUG (MainThread) [dsmr_parser.clients.protocol] connected 2020-09-26 10:08:44 DEBUG (MainThread) [dsmr_parser.clients.protocol] received data: /ISK5\2M550T-1012

1-3:0.2.8(50) 0-0:1.0.0(200926100844S) 0-0:96.1.1(4530303 2020-09-26 10:08:44 DEBUG (MainThread) [dsmr_parser.clients.protocol] received data: 434303036383738333337373137) 1-0:1.8.1(005081.035kWh) 1-0:1.8.2(006355.660kWh) 1-0:2.8.1(000000.000kWh) 1-0:2.8.2(000000.000kWh) 0-0:96.14.0(0001) 1-0:1.7.0(00.288kW) 1-0:2.7.0(00.000kW) 0-0:96.7.21(00005) 0-0:96.7.9(00004) 1-0:99.97.0 2020-09-26 10:08:44 DEBUG (MainThread) [dsmr_parser.clients.protocol] received data: (2)(0-0:96.7.19)(180927121817S)(0000000377s)(191204124131W)(0000014776s) 1-0:32.32.0(00003) 1-0:52.32.0(00003) 1-0:72.32.0(00004) 1-0:32.36.0(00001) 1-0:52.36.0(00001)

2020-09-26 10:08:44 DEBUG (MainThread) [dsmr_parser.clients.protocol] received data: 1-0:72.36.0(00001) 0-0:96.13.0() 1-0:32.7.0(232.9V) 1-0:52.7.0(232.4V) 1-0:72.7.0(232.9V) 1-0:31.7.0(000A) 1-0:51.7.0(001A) 1-0:71.7.0(000A) 1-0:21.7.0(00.000kW) 1-0:41.7.0(00.286kW) 1-0:61.7.0(00.000kW) 1-0:22.7.0(00.000kW) 1-0:42.7.0(00.000kW) 1-0:62.7.0(00.000kW) 0-2:24.1.0(003) 0-2:96.1.0(4730303332353635353538383330313138) 0-2:24.2.1(200926100504S)(02175.140*m3) !E8C8

2020-09-26 10:08:44 DEBUG (MainThread) [dsmr_parser.clients.protocol] got telegram: /ISK5\2M550T-1012

1-3:0.2.8(50) 0-0:1.0.0(200926100844S) 0-0:96.1.1(4530303434303036383738333337373137) 1-0:1.8.1(005081.035kWh) 1-0:1.8.2(006355.660kWh) 1-0:2.8.1(000000.000kWh) 1-0:2.8.2(000000.000kWh) 0-0:96.14.0(0001) 1-0:1.7.0(00.288kW) 1-0:2.7.0(00.000kW) 0-0:96.7.21(00005) 0-0:96.7.9(00004) 1-0:99.97.0(2)(0-0:96.7.19)(180927121817S)(0000000377s)(191204124131W)(0000014776s) 1-0:32.32.0(00003) 1-0:52.32.0(00003) 1-0:72.32.0(00004) 1-0:32.36.0(00001) 1-0:52.36.0(00001) 1-0:72.36.0(00001) 0-0:96.13.0() 1-0:32.7.0(232.9V) 1-0:52.7.0(232.4V) 1-0:72.7.0(232.9V) 1-0:31.7.0(000A) 1-0:51.7.0(001A) 1-0:71.7.0(000A) 1-0:21.7.0(00.000kW) 1-0:41.7.0(00.286kW) 1-0:61.7.0(00.000kW) 1-0:22.7.0(00.000kW) 1-0:42.7.0(00.000kW) 1-0:62.7.0(00.000kW) 0-2:24.1.0(003) 0-2:96.1.0(4730303332353635353538383330313138) 0-2:24.2.1(200926100504S)(02175.140*m3) !E8C8

RobBie1221 commented 4 years ago

Basically yes, let it run and see if it fails.

You can add the following (instead of except:):

except Exception as ex:
    _LOGGER.error(f"Exception of type {type(ex).__name__} occurred")

This will log the type of exception that occurs when it happens.

DarkwellZ commented 4 years ago

So it stopped at 13:14 nothing special to see in the logs around that time frame.

2020-09-26 13:14:43 DEBUG (MainThread) [dsmr_parser.clients.protocol] received data: /ISK5\2M550T-1012

1-3:0.2.8(50) 0-0:1.0.0(200926131444S) 0-0:96.1.1(4530303

There was a message at 12:55

Invalid telegram. The CRC checksum '18245' does not match the expected '11416' 12:55:36 – /usr/local/lib/python3.8/site-packages/dsmr_parser/clients/protocol.py (WARNING)

Logger: dsmr_parser.clients.protocol Source: /usr/local/lib/python3.8/site-packages/dsmr_parser/clients/protocol.py:109 First occurred: 12:55:36 (1 occurrences) Last logged: 12:55:36

Invalid telegram. The CRC checksum '18245' does not match the expected '11416'

DarkwellZ commented 4 years ago

I had added the code from above before so I had this:

            except Exception as ex:
                _LOGGER.exception("Error connecting to DSMR")
                _LOGGER.error(f"Exception of type {type(ex).__name__} occurred")
                # Log any error while establishing connection and drop to retry
                # connection wait
                transport = None
                protocol = None

the f in the code seems odd, which one is the right one?

_LOGGER.error(f"Exception of type {type(ex).__name__} occurred")

_LOGGER.error("Exception of type {type(ex).__name__} occurred")

RobBie1221 commented 4 years ago

The f is correct, it's an f-string to insert the type. But as you had it in, and nothing was logged, it fails in another way that apparantly goes undetected by the connect-reconnect...

What is your meter type and device you use to connect to your meter?

DarkwellZ commented 4 years ago

I have an ISKRA AM550:

AM550-TD02.01 SMR5.0

As for the reader: I think its a WEMOS D1 mini ESP8266 for WiFi connection. Connected to a self made (not by me) print. It is powered though the p1 port from my ISKRA.

The whole thing is running esp-link v2.2.3 to send the telegram over telnet. (I've asked the one that made the print to look at this question, he can probably give more detail on what it is exactly)

RobBie1221 commented 4 years ago

I suspect your hardware has stability issues. Could be because of power supply instabilies or maybe the WiFi connection.

Still the question why the integration does not pick up the what seems to be a connection error. Will have to dive in deeper.

DarkwellZ commented 4 years ago

I think it's my wifi connection (more devices have a drop of connection every now and then).

The root cause for that is found in this line:

2020-09-26 13:31:05 WARNING (MainThread) [homeassistant.loader] You are using a custom integration for ziggonext which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant.

In other words: I blame my Ziggo router.

I've added a external power source to my P1 reader to see if that helps.

Indeed would be great if the integration could pick up on the signal loss and reconnect.

zuidwijk commented 4 years ago

@RobBie1221 he is using my P1 module. This is a Wemos D1 mini with an add-on which is nothing more than a logic inverter and power stabiliser. On the Wemos is esp-link installed which sends the telegrams through telnet.

These issues doesn't occur when using dsmr-reader (v4.1 and higher). There is the option to connect via web socket and works flawless. Even when the connection drops, it reconnects within a few seconds when the connection is re-established.

In Home Assistant, when the connections is dropped, it won't reconnect. The Wemos/esp is sending telegrams, you can see them in the uConsole of esp-link and if you telnet to the Wemos/esp. When restarting Home Assistant, it starts work again.

Now I'm not able to test it myself as I'm using an ethernet module and don't want gaps in my dsmr-reader logging.

The modules are described on my website: https://www.zuidwijk.com/p1-modules/

If you want, I can send you a wifi p1 module so you can test it yourself if you want.

RobBie1221 commented 4 years ago

@zuidwijk Thanks for the information. I'm not sure if it helps to send one to me, but I'll think about it.

What seems to happen is that the connection fails and the data flow to Home Assistant stops, but the connection does not explicitly fail. I know from sockets this can happen, you can even pull an ethernet cable and it may take minutes before a socket sees that happened.

dsmr-reader seems to have a watchdog for receiving telegrams. If no telegram is received for a certain amount of time, it simply throws an error and forces reconnection. That is not being done neither in Home Assistant nor in the library used by Home Assistant (ndokter/dsmr_parser). Ideally there is a detection or shorter timeout on broken connection but I'm not sure if that's possible.

DarkwellZ commented 4 years ago

Could a logic work that checks the timestamp of the last telegram? Sort of as an internal time-out?

I do a similar thing to let me know it has frozen again.

I have an automation that detects the sensor being stale and notify me on my cellphone. {{(as_timestamp(now()) - as_timestamp(states.sensor.power_consumption.last_updated)) | float | round(4) > 0.1}}

I've tried as a workaround have this automation reload the integration but cant get that to work. Is there a way for me to request a reconnect though home assistant? Either from a python script or ask home assistant to reload the integration?

zuidwijk commented 4 years ago

@zuidwijk Thanks for the information. I'm not sure if it helps to send one to me, but I'll think about it.

What seems to happen is that the connection fails and the data flow to Home Assistant stops, but the connection does not explicitly fail. I know from sockets this can happen, you can even pull an ethernet cable and it may take minutes before a socket sees that happened.

dsmr-reader seems to have a watchdog for receiving telegrams. If no telegram is received for a certain amount of time, it simply throws an error and forces reconnection. That is not being done neither in Home Assistant nor in the library used by Home Assistant (ndokter/dsmr_parser). Ideally there is a detection or shorter timeout on broken connection but I'm not sure if that's possible.

Just let me know if you want/need one ;-) I'd like to contribute my way (this way) to the platform.

DarkwellZ commented 4 years ago

Hey I just wanted to drop by and report my "solution", which is actually a workaround, that seems to be working:

So I've made the following addition to my configuration.yaml:

rest_command:
  reload_dsmr:
    url: !secret dsmr_reload_url
    method: POST
    headers:
      authorization: !secret rest_token

For this I've set up the following parts in my secrets.yaml This requires 1: navigate to \homeassistant.local\config.storage\core.config_entries and grab the entry_id for the DSMR integration 2: a long term token for the RESTfull command (set it up in your profile using the ui)

insert the values from 1 and 2 in the below:

dsmr_reload_url:  'https://myfullhomeassistanturl.duckdns.org:8123/api/config/config_entries/entry/insert_config_id_of_dsmr_integration/reload'

rest_token: Bearer insert_a_long_term_token_here

Last I made a automation that is run every minute, it checks if the sensor has not been updated and if so, restarts the integration. As I have a DSMR version 5, it normally sends updates every second. So no data for 30 seconds means its dead.

- id: ''
  alias: Reconnect the DSMR
  description: ''
  trigger:
  - platform: time_pattern
    minutes: /1
  condition:
  - condition: template
    value_template: '{{ relative_time(states.sensor.power_consumption.last_updated
      | timestamp_local) > ''30'' }}'
  action:
  - service: rest_command.reload_dsmr
    data: {}

RobBie1221 commented 4 years ago

It's kind of ugly, but indeed that's the only way to reload.

I'm quite convinced the DSMR reader disconnects without a close (happens when e.g. network connection fails). In this case, the connection on HA is essentially blocking for a very long time. On a socket a timeout can be set, which would be the nicest solution. In the integration however it doesn't directly use sockets but create_connection from asyncio. I have to check whether lowlevel options can be set.

RobBie1221 commented 4 years ago

Btw, I guess the external power supply didn't work?

DarkwellZ commented 4 years ago

It's not pretty, but I'm pretty happy. Before it was sending me a notification and I was manually reloading it 20 times a day. Trust me, that gets boring really quickly.

No the external power made no difference. I still think its my WiFi connection / router that triggers the problem in the first place.

I'm back to the core integration so if a solution ever becomes available...

Thanks for you effort so far, and thank you @zuidwijk for your assistance as well.

groenmarsmannetje commented 4 years ago

I have a similar issue. It was introduced somewhere when I upgraded to Home Assistant 0.115.x and DSMR Reader version 4.4.3 (coming from 3.x). When I restart the MQTT process in DSMR Reader it works again for a while, but after time the sensors are not updated anymore in Home Assistant. Also when I turn on "Keep reconnecting" setting it still occurs.

I am trying now with setting Quality of Service = QoS1.

Maybe this also relates to this Issue

RobBie1221 commented 4 years ago

@groenmarsmannetje I think you are mixing up 2 integrations, this is related to dsmr integration, I suspect your issue is related to dsmr reader integration

zuidwijk commented 4 years ago

@groenmarsmannetje I do think, like @RobBie1221 said, you’re facing something else then described in this main issue.

Now I’m running 0.114.4 and dsmr-reader 4.4.3 and having none of those issues you’re describing. Now I’m running docker on Linux with every container self build and not using HassIO-ish setup. Perhaps it’s a mqtt thing? You’re saying restarting mqtt service within dsmr-reader solves it. Perhaps you should ask support from dsmr-reader?

groenmarsmannetje commented 4 years ago

Sorry for the confusion. I suggest to update the problem description in first message, because in there it mentions ‘DSMR Reader’ as integration.

DarkwellZ commented 4 years ago

Sorry for the confusion. I suggest to update the problem description in first message, because in there it mentions ‘DSMR Reader’ as integration.

Fixed that.

mjkl-gh commented 4 years ago

It's kind of ugly, but indeed that's the only way to reload.

I'm quite convinced the DSMR reader disconnects without a close (happens when e.g. network connection fails). In this case, the connection on HA is essentially blocking for a very long time. On a socket a timeout can be set, which would be the nicest solution. In the integration however it doesn't directly use sockets but create_connection from asyncio. I have to check whether lowlevel options can be set.

I've found I'm running into this issue as well. I blame wifi, my AP reports a signal strength of -85DB while being 2m away from it. (device itself is reporting <50DB from the AP).

@RobBie1221 Did you get the chance to look into the low-level stuff? anything I can help with?

RobBie1221 commented 4 years ago

I tried something with a Pi with Ser2net but that setup seems very robust. Pulling a cable doesn't really bother the integration and plugging it in again will get it working again. I suspect the Ser2net keeps te socket alive and floods the send buffer on his side. When plugging in again, it seems to just send again without even reconnecting.

Basically, I want to find out what can be detected by the low level socket. Nicest solution would be if om that level it is visible that something happens. A timer like solution to detect entities are not updating anymore should be a last resort, and I'm not even sure that is accepted from Home Assistant side or if that's something that needs to be done by the dsmr library.

I do have a python dsmr emulator now which I can kill easier and harder then the Ser2net. I was planning to give that a go and see what that brings me.

@mjkl-gh Are you using the same hardware as @DarkwellZ

zuidwijk commented 4 years ago

@RobBie1221 If you want I can send you a WiFi P1 reader (the same as @DarkwellZ has: logic inverter with Wemos and esp-link).

RobBie1221 commented 4 years ago

@zuidwijk Thanks (again) for the offer :-) I preferred investigating other options because this unit would require me to uncouple my own P1 cable, which I rather not do.

It's also not necessary. I'm able to reproduce the issue. I have Home Assistant running in a Ubuntu VM and a small daemon on the Windows host. What does work is if I kill the daemon, that is detected by the integration. What triggers the hangup is if I cut the network to the VM (which kind of would be similar to a WiFi hickup). In this case, the disconnect is not detected (I've waited about 30 minutes after which it still does not work) and updates are stopped until a manual reset is triggered.

zuidwijk commented 4 years ago

@RobBie1221 I understand, got the same issue. I got a wired (Ethernet) p1 reader which prevents me from long term testing. I am designing a active splitter to connect two reader on one meter, yet that's a long term project 😅

My Ethernet P1 reader:

mjkl-gh commented 4 years ago

@mjkl-gh Are you using the same hardware as @DarkwellZ

I'm using https://gist.github.com/oxan/4a1a36e12ebed13d31d7ed136b104959 on m5stickc

I'm guessing the code might be dropping clients too agressively and not reconnecting them when they get back. Is that making sense?

RobBie1221 commented 4 years ago

The problem is that Home Assistant runs a "connect and reconnect" loop. It only reconnects when it detects a disconnect. In the case there is a network failure, it fails to detect this as such and therefore "hangs". The existing connection is broken and it does not attempt to make a new connection. I'm trying to find out why it doesn't detect that there is a network issue and how I can influence the behavior.

mjkl-gh commented 4 years ago

I got as much from everything above. However, i'm looking into this gist at the same time, since i think something might not be right.

A: I think i read somewhere the whole point of the AsyncTCP library on esp was that it cleans up after itself. This gist however, includes a cleanup() in which it is actively cleaning disconnect clients. And disconnected gets set pretty easily. B: I've just opened a netcat session while logging through esphome. As I suspected i saw "192.168.x.x disconnected" right around the time it stopped receiving.

My guess is that the ESP is actively dropping the connection without telling the client. Increasing the occurence of issue we're seeiing here.

to further elaborate: the netcat session never notices a disconnect. It just stops receiving anything and hangs there. This sounds like the issue you are describing.

RobBie1221 commented 4 years ago

If a client disconnects without a close() that doesn't help. That will sort of trigger the same thing we see here, so if your device is doing that, that will definitely increase the chances of this occurring.

Nevertheless, it should be possible to make this robust, but the challenge is that the socket handling is wrapped by asyncio. I tried to manually make a socket with timeout on recv, but this doesn't do anything. It seems that once a network connection hangs, the recv is not even called anymore.

mjkl-gh commented 4 years ago

First of all, as i said i'm going to also do my best to to fix the close() part. But if it is desireable, I want to help in the effort to fix this part.

If I read it correctly the connect and reconnect awaits the creation of the reader.

            try:
                transport, protocol = await hass.loop.create_task(reader_factory())

Looking at the code that comes after it I'm assuming connect_and_reconnect() only gets resumed when the reader exits (error disconnect etc) Because everything else in there only handles shutdowns and reconnection.

Correct me if I'm wrong but i'm assuming the create_tcp_dsmr_reader() call never exits when closed() isn't called by the server.

what I don't get is the "transport, protocol =" part. loop.create_task() only returns a task object.

And when i look at this create_tcp_dsmr_reader() only returns conn. which i'm assuming should be the transport object. create_tcp_dsmr_reader() does have a protocol object but doesn't return it.

Am I missing some python stuff or shouldn't that be possible?

RobBie1221 commented 4 years ago

The conn is a transport, protocol pair (https://docs.python.org/3/library/asyncio-eventloop.html#opening-network-connections)

The issue is that it's only reconnecting when this function is called: https://github.com/ndokter/dsmr_parser/blob/0427ace079b2c691904cca2ae4425f7777623ae1/dsmr_parser/clients/protocol.py#L95-L101 The connection_lost is an override of the function defined in class asyncio.Protocol. The asyncio however does not detect an "ungraceful" disconnect as such and does not call this function. The main question is why...if we can find that out, perhaps we can do something about it.

mjkl-gh commented 4 years ago

Maybe because we are only receiving packets and not sending any? this suggests asyncio requires packets being sent to notice ungraceful disconnects.

RobBie1221 commented 4 years ago

The following is stated there:

The only reliable way to detect disconnection is waiting for a message from peer (it can be ping message or regular data packet) with a timeout.

That's similar to what I said. I tried to wait for a message with a timeout (now it waits without timeout). This however also doesn't work, because the socket recv is not called anymore when network error occurs. As said before, we need to understand why it's not called anymore.

mjkl-gh commented 4 years ago

Is there something I can help with in this process?, until Saturday I'm working on something else, but after that I would be able to try something.

First plan for me would be to install dmsr_parser through pip and see if it's possible to make it work.

Would you maybe mind sharing what you already tried with recv? I would be interested in that.

RobBie1221 commented 4 years ago

Well, I do not have a clue what is the root cause let alone a solution, so any help is appreciated.

I can take you through what I've found. As mentioned above, connection_lost is not called, which is why it does not reconnect. Initially I thought it may hang in socket recv. I've traced that down to selector_events.py from asyncio:

def _read_ready__data_received(self):
        if self._conn_lost:
            return
        try:
            data = self._sock.recv(self.max_size)
        except (BlockingIOError, InterruptedError):
            return
        except (SystemExit, KeyboardInterrupt):
            raise
        except BaseException as exc:
            self._fatal_error(exc, "Fatal read error on socket transport")
            return

        if not data:
            self._read_ready__on_eof()
            return

        try:
            self._protocol.data_received(data)
        except (SystemExit, KeyboardInterrupt):
            raise
        except BaseException as exc:
            self._fatal_error(exc, "Fatal error: protocol.data_received() call failed.")

I put a breakpoint here on self._sock.recv(self.max_size). This function is called every few seconds. When I create a network failure, this function is not called anymore (in fact, it does not enter in this function anymore).

What I tried was to replace here: https://github.com/ndokter/dsmr_parser/blob/0427ace079b2c691904cca2ae4425f7777623ae1/dsmr_parser/clients/protocol.py#L59

To the following:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host,int(port))
s.settimeout(10)

conn = loop.create_connection(protocol, sock=s)

This does not change anything, which is expected given above because this would only help if the root cause was recv that blocks.

zuidwijk commented 4 years ago

Ok, I'm testing it now too. I've installed home assistant on VMware (via the OVA) and only installed the dsmr-reader. Once I reseat the power/p1 from my Ethernet module (can handle 5 connections, 1 for my private and 1 for testing) it stops logging. While my dsmr-reader immediately starts working again.

zuidwijk commented 4 years ago

Ok, I've tested two scenarios:

1) Power loss to the module (same as restarting the Wemos/WiFi module): Home Assistant stops logging and won't reconnect to the module.

2) Unplug and replug the ethernet cable: Home Assistant starts logging again.

Edit: In both scenarios the DSMR-reader from Dennis Siemensma keeps logging.

zuidwijk commented 4 years ago

@RobBie1221 I don't think I have much time on Saturday but I'll try on Sunday to see if I can translate the python code to something I can understand (got a lot of php/c++/perl knowledge so I think I'll manage).

The weird part is, why it does reconnect after ethernet loss and not after power loss (restart of the telnet server).

Maybe if I can generate some logging I can see what's happening.

mjkl-gh commented 4 years ago

The weird part is, why it does reconnect after ethernet loss and not after power loss (restart of the telnet server).

If i look at the c++ code in my p1 logger, a reboot reinitializes the array of connected clients. Home assistant never reconnects ,but hangs, so no client is connected.

When reconnecting ethernet you re-establish connection. If the reader still remembers your client the packets will reach home assistant again, resuming the hanging connection. In my case a client gets agressively deleted on any error. Triggering this problem in both cases (restart and ethernet loss)

I have no experience whatsoever with lowlevel python at the moment, butdef _read_ready__data_received(self): sounds like data is already received. As in, something external triggers this function on getting a signal that there is some data in some buffer somewhere. Maybe this function is not called anymore since no data is entering that buffer?

I think this adds _read_ready() which is _read_ready_cb() which is _read_ready__data_received(). as callback to when READ_EVENT takes place on our socket.

Maybe since no READ_EVENTS take place recv() isn't called anymore.

I was hoping we could implement something in the way of

Promise1 = receive_telegram() Promise2 = timeout(interval)

Promise.race( Promise1, Promise2)

IF promise2 resetconnection()

However, as asyncio handles our socket continously that would be hard to implement

RobBie1221 commented 4 years ago

If we get rid of asyncio solving this is peanuts, because we can block a thread on recv and make sure socket timeout is set. The recv will always either give data or give TimeoutError, which would be the indication to reconnect. That would however mean serious refactoring of the library and home assistant.

Your theory is right, if both client and server do not cleanup the socket, when disconnecting the cable, the socket send will buffer (and perhaps overflow) but anyway if you plug it in again data starts flowing again.

I had a theory that asyncio uses socket select function before call recv to prevent blocking a thread when calling recv. A select would not see a hard disconnect, it would just return 0 bytes. I'm however not finding yet that it actually calls this function.

I'm not yet giving up on asyncio...

humanoidlux commented 4 years ago

Hi

I've a similar problem and maybe this will help you. The dsmr component worked fine with 0.113 until I updated to 0.116 and now 0.117. The entities are no longer available. I've activity on port 2001 (tcpdump) and after some sec. the connection is closed. socket 2 (fd8) is at EOF exiting with status 0

socat

Now I'm using sensor.py from (May27) which seams to work fine. https://raw.githubusercontent.com/home-assistant/core/cfaa851b5b8ace3d14262c2e8c57a237d5f9d3ec/homeassistant/components/dsmr/sensor.py

My reader is wired.

RobBie1221 commented 4 years ago

@humanoidlux I'm not sure this is the same problem. What essentially changed between 0.113 and 0.116 is that it doesn't use platform setup anymore. When importing to a ConfigEntry, it's validating the connection. That means it connects, waits a few seconds and if validation was successful, it disconnects and adds a ConfigEntry. After that, it should reconnect through the normal flow (from sensor.py). When validation was unsuccessful, it will never reconnect, because import failed.

When you past back the old sensor.py, it will use platform setup again without connection validation. I suspect this connection validation does not work for you, question is why. What you could do to verify is to place back the original sensor.py, startup Home Assistant and check Configuration->Integrations and check if there is a card with "DSMR". If not, then import fails and this is another issue then described here. In that case, please make a new issue, then we'll continue debugging there.

home-assistant / core