mysensors / MySensors

MySensors library and examples
https://www.mysensors.org
1.31k stars 892 forks source link

Not receiving echo anymore #1349

Closed rikki78 closed 4 years ago

rikki78 commented 4 years ago

Recently I have updated my ESP32 MQTT gateway to the latest version. Now I don't receive an echo anymore. In the previous version I used this was what was shown on the log:

(15:23:53.015) 961727733 GWT:IMQ:TOPIC=mygateway1-in/30/4/2/1/3, MSG RECEIVED
(15:23:53.015) 961727740 TSF:MSG:SEND,0-0-40-30,s=4,c=2,t=3,pt=0,l=2,sg=0,ft=0,st=OK:60
(15:23:53.015) 961727789 TSF:MSG:READ,30-40-0,s=4,c=2,t=3,pt=0,l=2,sg=0:60
(15:23:53.015) 961727795 TSF:MSG:ECHO
(15:23:53.015) 961727799 GWT:TPS:TOPIC=mygateway1-out/30/4/2/1/3,MSG SENT

With the latest version this is shown in the log:

(08:59:34.511) 215543368 GWT:IMQ:TOPIC=mygateway1-in/30/4/2/1/3, MSG RECEIVED
(08:59:34.511) 215543375 TSF:MSG:SEND,0-0-40-30,s=4,c=2,t=3,pt=0,l=1,sg=0,ft=0,st=OK:0

I'm sure the network is okay since it used to work just fine, and I receive all other messages also. Also, this started after updating the software.

I've tried it with the GatewayESP32MQTTClient example as well.

I'm using the latest ESP32 platformIO release (1.10.0) and the mysensors development branch.

Any ideas? Did I miss a configuration setting perhaps?

tekka007 commented 4 years ago

@rikki78 Please also post the log from the node incl. startup messages

mfalkvidd commented 4 years ago

You might need to define MY_DEBUG_VERBOSE_GATEWAY and/or MY_DEBUG_VERBOSE_TRANSPORT

rikki78 commented 4 years ago

That requires a bit more work. Can you replicate it in your setup?

Op za 28 sep. 2019 09:34 schreef Olivier notifications@github.com:

@rikki78 https://github.com/rikki78 Please also post the log from the node incl. startup messages

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mysensors/MySensors/issues/1349?email_source=notifications&email_token=ACVJ6Y4EN6U47PSUXHAMMHTQL4CJ5A5CNFSM4I3ND2EKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD72TCIQ#issuecomment-536162594, or mute the thread https://github.com/notifications/unsubscribe-auth/ACVJ6YZOXRPBGL74EL5F5UTQL4CJ5ANCNFSM4I3ND2EA .

tekka007 commented 4 years ago

@rikki78 Echo works in my hands. Can you add #define MY_DEBUG_VERBOSE_TRANSPORT_HAL to your gateway sketch and again post the gw log?

rikki78 commented 4 years ago

Here you go.


86966 GWT:IMQ:TOPIC=mygateway1-in-tst/30/4/2/1/3, MSG RECEIVED
86971 THA:SND:MSG=00001E121203043430
86979 THA:SND:MSG LEN=9,RES=1
86982 TSF:MSG:SEND,0-0-40-30,s=4,c=2,t=3,pt=0,l=2,sg=0,ft=0,st=OK:40
tekka007 commented 4 years ago

The EchoRequest bit is set in the outgoing message - so no issue here. Next, would be great if you can retrieve the full debug log (enable #define MY_DEBUG_VERBOSE_TRANSPORT_HAL and #define MY_DEBUG_VERBOSE_TRANSPORT) from node 30 and repeater 40.

rikki78 commented 4 years ago

Just a quick update, I have made a test setup with just a gateway and a node. Here the echo works fine.

node log

212 TSM:INIT
294 TSF:WUR:MS=0
311 THA:INIT
327 TSM:INIT:TSP OK
360 TSM:INIT:STATID=34
376 THA:SAD:ADDR=34
409 TSF:SID:OK,ID=34
425 TSM:FPAR
442 THA:SND:MSG=2222FF020307FF
491 THA:SND:MSG LEN=7,RES=1
507 ?TSF:MSG:SEND,34-34-255-255,s=255,c=3,t=7,pt=0,l=0,sg=0,ft=0,st=OK:
638 THA:DATA:AVAIL
671 THA:RCV:MSG=0000220A2308FF00
704 THA:RCV:MSG LEN=8
737 TSF:MSG:READ,0-0-34,s=255,c=3,t=8,pt=1,l=1,sg=0:0
786 TSF:MSG:FPAR OK,ID=0,D=1
2621 TSM:FPAR:OK
2637 TSM:ID
2654 TSM:ID:OK
2670 TSM:UPL
2686 THA:SND:MSG=2222000A2318FF01
2719 THA:SND:MSG LEN=8,RES=1
2752 TSF:MSG:SEND,34-34-0-0,s=255,c=3,t=24,pt=1,l=1,sg=0,ft=0,st=OK:1
2834 THA:DATA:AVAIL
2867 THA:RCV:MSG=0000220A2319FF01
2899 THA:RCV:MSG LEN=8
2932 TSF:MSG:READ,0-0-34,s=255,c=3,t=25,pt=1,l=1,sg=0:1
2998 TSF:MSG:PONG RECV,HP=1
3031 TSM:UPL:OK
3047 TSM:READY:ID=34,PAR=0,DIS=1
20414 THA:DATA:AVAIL
20447 THA:RCV:MSG=000022121203043430
20480 THA:RCV:MSG LEN=9
20512 TSF:MSG:READ,0-0-34,s=4,c=2,t=3,pt=0,l=2,sg=0:40
20578 TSF:MSG:ECHO REQ
20594 THA:SND:MSG=222200120A03043430
20643 THA:SND:MSG LEN=9,RES=1
20676 TSF:MSG:SEND,34-34-0-0,s=4,c=2,t=3,pt=0,l=2,sg=0,ft=0,st=OK:40

GW log


131672 GWT:IMQ:TOPIC=mygateway1-in-tst/34/4/2/1/3, MSG RECEIVED
131678 THA:SND:MSG=000022121203043430
131685 THA:SND:MSG LEN=9,RES=1
131688 TSF:MSG:SEND,0-0-34-34,s=4,c=2,t=3,pt=0,l=2,sg=0,ft=0,st=OK:40
131917 THA:DATA:AVAIL
131920 THA:RCV:MSG=222200120A03043430
131925 THA:RCV:MSG LEN=9
131929 TSF:MSG:READ,34-34-0,s=4,c=2,t=3,pt=0,l=2,sg=0:40
131936 TSF:MSG:ECHO
131940 GWT:TPS:TOPIC=mygateway1-out-tst/34/4/2/1/3,MSG SENT
140277 GWT:IMQ:TOPIC=mygateway1-in-tst/34/4/2/1/3, MSG RECEIVED
140282 THA:SND:MSG=000022121203043430
140290 THA:SND:MSG LEN=9,RES=1
140293 TSF:MSG:SEND,0-0-34-34,s=4,c=2,t=3,pt=0,l=2,sg=0,ft=0,st=OK:40
140522 THA:DATA:AVAIL
140524 THA:RCV:MSG=222200120A03043430
140529 THA:RCV:MSG LEN=9
140533 TSF:MSG:READ,34-34-0,s=4,c=2,t=3,pt=0,l=2,sg=0:40
140541 TSF:MSG:ECHO
140545 GWT:TPS:TOPIC=mygateway1-out-tst/34/4/2/1/3,MSG SENT

I will work on the log for the repeater and come back to it.

rikki78 commented 4 years ago

So, I have collected some information.

I have now looked at the concerning node (arduino pro mini 8Mhz), the one that didn't send the ack anymore. First I tried to update the software to the development branch but that gave strange results.

(11:23:05.206) |  \/  |_   _/ ___|  ___ _ __  ___  ___  _ __ ___
(11:23:05.206) | |\/| | | | \___ \ / _ \ `_ \/ __|/ _ \| `__/ __|
(11:23:05.206) | |  | | |_| |___| |  __/ | | \__ \  _  | |  \__ \
(11:23:05.206) |_|  |_|\__, |____/ \___|_| |_|___/\___/|_|  |___/
(11:23:05.206)         |___/                      2.3.2-beta
(11:23:05.206) <CR>
(11:23:05.206) 20 MCO:BGN:INIT REPEATER,CP=RNNRA-Q-,FQ=8,REL=8,VER=2.3.2-beta
(11:23:05.206) 30 TSM:INIT
(11:23:05.206) 32 TSF:WUR:MS=0
(11:23:05.206) 34 THA:INIT
(11:23:05.206) 34 RF24:INIT:PIN,CE=9,CS=10
(11:23:05.206) 38 RF24:SBY
(11:23:05.206) 38 RF24:WBR:REG=0,VAL=62
(11:23:05.206) 47 RF24:WBR:REG=3,VAL=3
(11:23:05.206) 49 RF24:WBR:REG=4,VAL=95
(11:23:05.206) 51 RF24:WBR:REG=5,VAL=76
(11:23:05.206) 55 RF24:WBR:REG=6,VAL=37
(11:23:05.206) 57 RF24:WBR:REG=29,VAL=4
(11:23:05.206) 59 RF24:RBR:REG=29,VAL=4
(11:23:05.206) 63 RF24:RBR:REG=6,VAL=37
(11:23:05.206) 65 RF24:RBR:REG=5,VAL=76
(11:23:05.206) 67 RF24:WBR:REG=2,VAL=2
(11:23:05.206) 71 RF24:WBR:REG=1,VAL=0
(11:23:05.206) 73 RF24:WBR:REG=28,VAL=3
(11:23:05.206) 75 RF24:FRX
(11:23:05.206) 77 RF24:FTX
(11:23:05.206) 77 RF24:WBR:REG=7,VAL=112
(11:23:05.206) 81 TSM:INIT:TSP OK
(11:23:05.206) 83 TSM:INIT:STATID=6
(11:23:05.206) 86 THA:SAD:ADDR=6
(11:23:05.206) 88 RF24:WBR:REG=2,VAL=3
(11:23:05.206) 90 RF24:WBR:REG=1,VAL=1
(11:23:05.206) 92 RF24:STL
(11:23:05.206) 94 RF24:WBR:REG=0,VAL=63
(11:23:05.206) 96 RF24:WBR:REG=10,VAL=6
(11:23:05.206) 98 TSF:SID:OK,ID=6
(11:23:05.206) 100 TSM:FPAR
(11:23:05.206) 102 THA:SND:MSG=0606FF020307FF
(11:23:05.206) 106 RF24:SPL
(11:23:05.206) 106 RF24:WBR:REG=0,VAL=62
(11:23:05.206) 110 RF24:OWP:RCPT=255
(11:23:05.206) 112 RF24:WBR:REG=10,VAL=255
(11:23:05.206) 114 RF24:WBR:REG=16,VAL=255
(11:23:05.206) 118 RF24:TXM:TO=255,LEN=7
(11:23:05.206) 120 RF24:FTX
(11:23:05.206) 122 RF24:WBR:REG=4,VAL=80
(11:23:05.206) 126 RF24:WBR:REG=7,VAL=112
(11:23:05.206) 131 ?RF24:TXM:MAX_RT
(11:23:05.206) 133 RF24:FTX
(11:23:05.206) 135 RF24:WBR:REG=4,VAL=95
(11:23:05.206) 137 RF24:STL
(11:23:05.206) 139 RF24:WBR:REG=0,VAL=63
(11:23:05.206) 141 RF24:WBR:REG=10,VAL=6
(11:23:05.206) 143 THA:SND:MSG LEN=7,RES=1
(11:23:05.206) 147 ?TSF:MSG:SEND,6-6-255-255,s=255,c=3,t=7,pt=0,l=0,sg=0,ft=0,st=OK:
<Error: RX FRAMING ERROR>(11:23:05.716) 806 RF24:RXM:LEN=8
(11:23:05.716) 806 RF24:WB<Error: RX FRAMING ERROR>R:REG=7,VAL=64
(11:23:06.039) 808 THA:DATA:AVAIL
(11:23:06.039) 811 THA:RCV:MSG=0000060A2308FF00
(11:23:06.039) 815 THA:RCV:MSG LEN=8
(11:23:06.039) 817 TSF:MSG:READ,0-0-6,s=255,c=3,t=8,pt=1,l=1,sg=0:0
(11:23:06.039) 823 TSF:MSG:FPAR OK,ID=0,D=1
(11:23:07.086) 2156 TSM:FPAR:OK
(11:23:07.086) 2156 TSM:ID
(11:23:07.086) 2158 TSM:ID:OK
(11:23:07.086) 2160 TSM:UPL
(11:23:07.086) 2162 THA:SND:MSG=0606000A2318FF01
(11:23:07.086) 2164 RF24:SPL
(11:23:07.086) 2166 RF24:WBR:REG=0,VAL=62
(11:23:07.086) 2170 RF24:OWP:RCPT=0
(11:23:07.086) 2172 RF24:WBR:REG=10,VAL=0
(11:23:07.199) 2174 RF24:WBR:REG=16,VAL=0
(11:23:07.199) 2179 RF24:TXM:TO=0,LEN=8
(11:23:07.199) 2181 RF24:FTX
(11:23:07.199) 2183 RF24:WBR:REG=7,VAL=112
(11:23:07.199) 2187 RF24:STL
(11:23:07.199) 2189 RF24:WBR:REG=0,VAL=63
(11:23:07.199) 2191 RF24:WBR:REG=10,VAL=6
(11:23:07.199) 2193 THA:SND:MSG LEN=8,RES=1
(11:23:07.199) 2197 TSF:MSG:SEND,6-6-0-0,s=255,c=3,t=24,pt=1,l=1,sg=0,ft=2,st=OK:1
(11:23:07.199) 2215 RF24:RXM:LEN=8
(11:23:07.199) 2215 RF24:WBR:REG=7,VAL=64
(11:23:07.199) 2217 THA:DATA:AVAIL
(11:23:07.199) 2222 THA:RCV:MSG=0000060A2319FF01
(11:23:07.199) 2224 THA:RCV:MSG LEN=8
(11:23:07.199) 2226 TSF:MSG:READ,0-0-6,s=255,c=3,t=25,pt=1,l=1,sg=0:1
(11:23:07.199) 2232 !TSF:MSG:PONG RECV,INACTIVE
(11:23:07.199) 2236 TSM:UPL:OK
(11:23:07.199) 2238 TSM:UPL:DGWC,O=1,N=30
(11:23:07.199) 2240 TSM:READY:ID=6,PAR=0,DIS=30
(11:23:08.557)  
(11:23:08.557)  __  __       ____
(11:23:08.557) |  \/  |_   _<Error: RX FRAMING ERROR>/ ___|  ___ _ __  ___  ___  _ __ ___
(11:23:08.653) | |\/| | | | \___ \ / _ \ `_ \/ __|/ _ \| `__/ __|
(11:23:08.653) | |  | | |_| |___| |  __/ | | \__ \  _  | |  \__ \
(11:23:08.653) |_|  |_|\__, |____/ \___|_| |_|___/\___/|_|  |___/
(11:23:08.653)         |___/                      2.3.2-beta
(11:23:08.653) <CR>

It kept on resetting so I reverted to 2.3.1. The RX framing errors are caused by the long cable I used I think. Since the MY_DEBUG_VERBOSE_TRANSPORT_HAL is not supported, I defined MY_DEBUG_VERBOSE_RF24 instead.

Gateway logs

(11:26:07.486) 65878 GWT:IMQ:TOPIC=mygateway1-in-tst/6/4/2/1/0, MSG RECEIVED
(11:26:07.578) 65883 THA:SND:MSG=000006121200043430
(11:26:07.578) 65891 THA:SND:MSG LEN=9,RES=1
(11:26:07.578) 65894 TSF:MSG:SEND,0-0-6-6,s=4,c=2,t=0,pt=0,l=2,sg=0,ft=0,st=OK:40
(11:26:10.515) 68907 THA:DATA:AVAIL

and in the same time range

(11:26:07.578) 32468 RF24:RBR:REG=23,VAL=16
(11:26:07.578) 32468 RF24:RXM:LEN=9
(11:26:07.578) 32468 RF24:WBR:REG=7,VAL=64
(11:26:07.578) 32468 RF24:RBR:REG=23,VAL=17
(11:26:07.578) 32471 TSF:MSG:READ,0-0-6,s=4,c=2,t=0,pt=0,l=2,sg=0:40
(11:26:07.578) 32477 TSF:MSG:ACK
(11:26:33.001) 57901 RF24:SPL

So it does tries to send the ACK but for me it is unclear if it succeeds.

The repeater is taken out of the equation now, I had the gateway close to the node. On the node there is an older (clone) RF24, on the gateway an Ebyte module.

I have no idea what could be going on. I hope you can see something from the logs. Let me know if more information is needed.

tekka007 commented 4 years ago

looks like there is a memory corruption before the repeater resets. Can you post the repeater sketch for further analysis?

rikki78 commented 4 years ago

Hi Olivier, Thanks for your reply. I checked the sketch myself and found something was overwriting the memory indeed. I have solved that but still it resets with v2.3.2 beta. There are some extra modules in the sketch that could cause this and I will disable these to exclude them from causing this, and then I will post the sketch. I think this is a different issue than the original, as that doesn't explain (at least not for me) why a software update on the gateway caused no more acknowledges to be received from this node. But perhaps you have a better view on that. Rik

Edit: Please also note that after downgrading to 2.3.1 no more resets occur.

rikki78 commented 4 years ago

I've had a look at the extra modules and I don't think these are the issue. I have these defines:

#define MY_REPEATER_FEATURE 
#define MY_RX_MESSAGE_BUFFER_FEATURE
#define MY_RX_MESSAGE_BUFFER_SIZE 10  
#define MY_RF24_IRQ_PIN   2

when I disable the MY_RX_MESSAGE_BUFFER_FEATURE and MY_RX_MESSAGE_BUFFER_SIZE the module doesn't reset anymore.

I've uploaded the code to https://github.com/rikki78/MySensorsDimmer With this line https://github.com/rikki78/MySensorsDimmer/blob/master/src/main.cpp#L499 commented, and the MY_RX_MESSAGE_BUFFER_FEATURE and MY_RX_MESSAGE_BUFFER_SIZE active, the node works also well.

I'm out of ideas. I hope you can find what is going on.

tekka007 commented 4 years ago

The symptoms look a lot like you're running out of memory. When you compile for 2.3.2 - how much memory is left? You can reduce your memory footprint by disabling all debug outputs and/or reducing the size of the RX buffer (10 seems a bit too much, I'd reduce it to ~3)

rikki78 commented 4 years ago

Good point. I haven't thought about that and I will check it tonight.

rikki78 commented 4 years ago

You were right, memory was too low. I fixed it and now 2.3.2 beta works. This is the node log (I don't see the HAL debug info here).

(12:04:11.094) 335214 TSF:MSG:READ,0-40-6,s=4,c=2,t=0,pt=0,l=2,sg=0:40
(12:04:11.094) 335220 TSF:MSG:ACK

This is the log from the repeater

(15:05:34.349) 132186 THA:DATA:AVAIL
(15:05:34.349) 132188 THA:RCV:MSG=000006121200043430
(15:05:34.349) 132192 THA:RCV:MSG LEN=9
(15:05:34.349) 132194 TSF:MSG:READ,0-0-6,s=4,c=2,t=0,pt=0,l=2,sg=0:40
(15:05:34.349) 132199 TSF:MSG:REL MSG
(15:05:34.349) 132201 THA:SND:MSG=280006121200043430
(15:05:34.349) 132206 THA:SND:MSG LEN=9,RES=1
-->(15:05:34.349) 132209 TSF:MSG:SEND,0-40-6-6,s=4,c=2,t=0,pt=0,l=2,sg=0,ft=0,st=OK:40
(15:05:39.460) 137332 THA:DATA:AVAIL
(15:05:39.460) 137334 THA:RCV:MSG=060600124117030B00
(15:05:39.460) 137337 THA:RCV:MSG LEN=9
(15:05:39.504) 137340 TSF:MSG:READ,6-6-0,s=3,c=1,t=23,pt=2,l=2,sg=0:11
(15:05:39.504) 137345 TSF:MSG:REL MSG
(15:05:39.504) 137348 THA:SND:MSG=280600124117030B00
(15:05:39.504) 137353 THA:SND:MSG LEN=9,RES=1
(15:05:39.504) 137356 TSF:MSG:SEND,6-40-0-0,s=3,c=1,t=23,pt=2,l=2,sg=0,ft=0,st=OK:11

And this one from the gateway. I used my 'production' one and there is no HAL debug information. It is hard to have all the logs in the same time so I gathered them in stages.

(15:05:34.298) 168928 GWT:IMQ:TOPIC=mygateway1-in/6/4/2/1/0, MSG RECEIVED
(15:05:34.370) 168938 TSF:MSG:SEND,0-0-40-6,s=4,c=2,t=0,pt=0,l=2,sg=0,ft=0,st=OK:40
(15:05:39.521) 174144 TSF:MSG:READ,6-40-0,s=3,c=1,t=23,pt=2,l=2,sg=0:11
(15:05:39.566) 174149 GWT:TPS:TOPIC=mygateway1-out/6/3/1/0/23,MSG SENT

As I interpret it, the repeater sends the ACK (see -->) and it is received ok by the gateway hardware.

Edit I just realised I used v2.3.1 for the node, that is probably why there is no HAL logging. I hope the HAL info from the repeater helps

tekka007 commented 4 years ago

Perfect! Thank you for your feedback

rikki78 commented 4 years ago

eh... The original issue is still there. So I don't receive the echo on the gateway.

tekka007 commented 4 years ago

Ups, my bad - I misread the last point

tekka007 commented 4 years ago

Ok, I assume your 'production' GW runs on a older 2.3.2-beta version dating from ~mid July? If so, the issue is related to #1319. Updating to the most recent dev-version will likely resolve the problem.

rikki78 commented 4 years ago

Well actually I'm on the latest development version (#74e566f). So is the repeater, but the node is on 2.3.1

rikki78 commented 4 years ago

This seems strange to me. https://github.com/mysensors/MySensors/blob/development/core/MyMessage.cpp#L138 The function checks if an echo is received, but it looks to the echo request bit.

rikki78 commented 4 years ago

I've now swapped the V2_MYS_HEADER_CEP_ECHOREQUEST_POS and V2_MYS_HEADER_CEP_ECHO_POS which solves the issue. I'll submit a PR.

tekka007 commented 4 years ago

yes, good catch - that looks a bit odd 👍