turbokongen / hass-AMS

Custom component reading AMS through MBus adapter into HomeAssistant
44 stars 10 forks source link

Frame sync problem in code version 1.9 #71

Closed rogere66 closed 2 years ago

rogere66 commented 2 years ago

Version 1.9 fixed the random data read error, but unfortunately introduced a new problem. If a corrupt message is received, the message is skipped, but the code is not able to sync on successive messages. The debug log shows:

2021-12-28 10:10:09 DEBUG (Thread-3) [custom_components.ams] Not a valid packet. Start over again 2021-12-28 10:11:49 DEBUG (Thread-3) [custom_components.ams] Not a valid packet. Start over again 2021-12-28 10:13:29 DEBUG (Thread-3) [custom_components.ams] Not a valid packet. Start over again 2021-12-28 10:15:09 DEBUG (Thread-3) [custom_components.ams] Not a valid packet. Start over again

The error can easily be reproduced by disconnecting and connecting the HAN cable a few times.

The problem is that the code will try to sync on the next FRAME_FLAG, but this will typically be the frame END flag of the corrupt frame, not the START flag of the next frame (at least on an Aidon meter). The 2nd byte will then be the FRAME_FLAG of the next frame, but the code will interpret it as 1st byte of the frame format field, containing part of the length field. The code will then try to receive a very long message, which will also appear corrupt.

It will thus be necessary to also validate the frame format field contained in the next 2 bytes before starting to receive the message. The message length is the least significant 11 bits (not 12) of the frame format field, according to this document (section 3.3.1): https://ntnuopen.ntnu.no/ntnu-xmlui/bitstream/handle/11250/2625734/no.ntnu%3ainspera%3a2468545.pdf?sequence=6&isAllowed=y

A reasonable frame-start validation could be to check that the 2nd byte is NOT a FRAME_FLAG and that the decoded message length is within reasonable limits, say 10-1000 bytes.

It would also be good to have more info in the DEBUG message if a frame is skipped, typically printing the actual message.

turbokongen commented 2 years ago

Please try https://github.com/turbokongen/hass-AMS/tree/sync

frankiboy1 commented 2 years ago

@turbokongen, I'm not sure if this fix will fix all issues. You fix will correct the problem when the frame start is missed, data is purged until the frame end and the next byte is the frame start. Apparently, 126 can occur in the middle of a packet, so you could have the situation where you miss the frame start and receive a false frame start in the middle of a packet. Then the decoded size will be a random value and you will get problems until you have built a packet with the decoded size and discard that packet as it most likely does not have a proper frame end.

I wonder if a more robust frame sync detection mechanism can be built using read timeout on the serial interface. As far as I know, all meters send a packet every 2 seconds. If you try to read a byte and it takes more than 1s, you know that you have missed the end of the packet and can reset in buffer and counter. This should also work for the cases where you detect a false frame start.

I can try to put together a proposal for this, so you can take a look at an alternative fix.

turbokongen commented 2 years ago

Kamstrup meters unfortunately only send every 10 seconds, never every 2 second. I will go through the packets I have got sent, and see if there is something that we can use to identify start of package. I will have to make myself a flowchart to make a solution for this I think.

frankiboy1 commented 2 years ago

I've created a PR#73 with a proposed fix for this issue.

The serial is configured with a timeout of 0.1s from before. If a read times out, I expect that we have reached the time between packets and drop the current packet. This should work for any meters (independent of the delay between packets), as long as this delay is more than the TIMEOUT value (+ some slack).

turbokongen commented 2 years ago

Should be fixed in v1.9.1