patience4711 / read-APSystems-YC600-QS1-DS3

Software for an esp8266 nodemcu to read data from APS inverters.
130 stars 24 forks source link

Zigbee crashes #97

Closed togoright closed 1 year ago

togoright commented 1 year ago

Hello patience4711,

I want to thank you for the great job you did. I've been using your program since the end of 2021 and I'm very excited.

Before I knew your project, I had neither programmed in C++ nor compiled a program, so it was completely new territory for me. In the meantime, I have made many changes to adapt it to my needs. I probably also found a solution for the hang-ups when querying the inverter data too often. I've been polling every two minutes since the beginning. In the beginning I also had the hangers again and again. Since I create the pollcommand differently, I don't have them anymore. As an experiment, I polled every 10 seconds for two days without any problems. Even with 5 seconds there were no hangups on two days, but there was a one-time error when decoding or calculating the total daily yield. I didn't pursue this any further because I switched back to a polling time of 2 minutes. I have 1 YC600 and use a D1-mini pro with CC2530+CC2591.

Here my change in ZIGBEE_POLLING.ino:

` // char pollCommand[254] = {0}; // char normalOperationBaseCommand[][254] = // { // "2401", //+ ..string.sub(inv_id[x],3,4)..string.sub(inv_id[x],1,2).. // "1414060001000F13", //append (concatenate) _ecu_id_reverse // // "1414000601000F13", //append (concatenate) _ecu_id_reverse // "FBFB06BB000000000000C1FEFE" // end of String // }; //
// //Serial.println("Inv_Prop[which].invID = " + String(Inv_Prop[which].invID) ); // char inv_id[7]; // memset(&inv_id[0], 0, sizeof(inv_id)); //zero out otherwise we get strange tokens // delayMicroseconds(250); //
// strncpy(inv_id, Inv_Prop[which].invID, strlen(Inv_Prop[which].invID)); // inv_id[7]='\0'; //terminate // // char ecu_id_reverse[13];
// ECU_REVERSE().toCharArray(ecu_id_reverse, 13); //
// strncpy(pollCommand, normalOperationBaseCommand[0], strlen(normalOperationBaseCommand[0])); // the 2401 // //Serial.println("pollCommand now = " + String(pollCommand)); // strncat(pollCommand, inv_id + 4, 2 ); // ad the 2nd byte of inv_id // strncat(pollCommand, inv_id + 2, 2 ); // ad the 1st byte of inv_id // //Serial.println("pollCommand after reversed inv id = " + String(pollCommand)); // strncat(pollCommand, normalOperationBaseCommand[1], sizeof(normalOperationBaseCommand[1])); // //Serial.println("pollCommand after 1414060001000F13 = " + String(pollCommand));
// strncat(pollCommand, ecu_id_reverse, sizeof(ecu_id_reverse)); // //Serial.println("pollCommand after reversed in id = " + String(pollCommand)); // strncat(pollCommand, normalOperationBaseCommand[2], sizeof(normalOperationBaseCommand[2])); // //Serial.println("pollCommand after FBFB06BB000000000000C1FEFE = " + String(pollCommand)); //
// // calculate length and put this at beginning // strcpy(pollCommand, strncat(sLen(pollCommand), pollCommand, sizeof(sLen(pollCommand)) + sizeof(pollCommand))); //build command plus sln at the beginning // // put in the CRC at the end of the command // strcpy(pollCommand, strncat(pollCommand, checkSum(pollCommand), sizeof(pollCommand) + sizeof(checkSum(pollCommand)))); // //DebugPrintln("pollCommand:"); //DebugPrintln(String(pollCommand));

char pollCommand[67] = {0};
char ecu_id_reverse[13];  
ECU_REVERSE().toCharArray(ecu_id_reverse, 13);

snprintf(pollCommand, sizeof(pollCommand), "1D2401%.2s%.2s1414060001000F13%sFBFB06BB000000000000C1FEFE", Inv_Prop[which].invID + 4, Inv_Prop[which].invID + 2, ecu_id_reverse);

// put in the CRC at the end of the command
strncat(pollCommand, checkSum(pollCommand), 2);

if(diagNose) ws.textAll("zb send poll cmd inverter " + String(which) + "  cmd:" + String(pollCommand));   
    #ifdef DEBUG
    swap_to_zb(); // set serial to zb  
    #endif

sendZigbee(pollCommand);

`

To relieve the processor, it might be beneficial to create the ecu_id_reverse once and store it in a variable so that it doesn't have to be recreated each time.

Best regards

patience4711 commented 1 year ago

@togoright Hi thanks for this contribution. It would be good news that this would help to avoid the zigbee crashes. Your solution seems a lot simpler but i cant I can't oversee if this is actually the case. Therefor my knowledge is not sufficient, i am not a programmer at all you see. But the benefit as to the memory is obvious.

I see that the data length is hardcoded now, (1D). You assumed that the only variables are the inv-id and the ecu_id, so that the command has always the same length. I noticed that because the length addition strcpy(pollCommand, strncat(sLen(pollCommand), pollCommand, sizeof(sLen(pollCommand)) + sizeof(pollCommand))); caused a crash in the esp32. The strncat component maybe charges the memory (stack) too much. So i considered to hard-code that too. I found another workaround for this but here is something going on that you avoided in your code. I did some tests with your code and it works oke so i included it in a new version. Hopefully the enthousiastic testers will find that it helps.

Regarding the the ecu_id_reverse, I assumed that we already have too much global variables so i wonder if that's a good idea. Thanks again, really appreciate it.

togoright commented 1 year ago

Wow, you are very fast 👍

I'm not a programmer either, I only do it as a hobby.

swbouman commented 1 year ago

I installed the XG version, uptime is 32 min, free mem: 16632 bytes-- with 7 inverters, reset counter is 3.

I keep you posted.

patience4711 commented 1 year ago

@togoright yes, maybe too fast. I commented out my polling function and replaced it with yours. What happens now is that at every healtcheck the ping to the zigbee (and other communication) fails. Meaning that the coordinator will be restarted. When i disable the autopolling this doesn't happen. I can poll via the console and do healtchecks as much as i like, the coordinator remains up. If the problem has something to do with your polling code, the problem should also occure when manually polling, one should think. When i reverse the polling methods back, this behaviour is gone. I tried some other things, changed the reset pin for instance, made the reset permanently high, nothing helped. this is really driving me nuts. The healthcheck is once in 10 minutes. When autopolling i can do healthchecks in between, all oke but when the sceduled healthcheck takes place, it goes wrong.

In short, after some other tests the situation with your code is this:

So what I did is skipping the automatic healthcheck and set a flag for the healtcheck when the length of a poll answer is 0. This worked but that is not a real solution.

The reason i am telling you this is that maybe you encountered also some strange behaviour while implementing your code.

patience4711 commented 1 year ago

After a lot of headbanging i figured that with autopolling/healtcheck they coincided. This is not the case when we do this manually. Could that have something to do with it? But why not with the original polling code? Anyway, the healthcheck was every 10 minutes and now 12 minutes. Problem seems gone.

Even better, if your code helps, we don't need the healthcheck anymore.

joffito commented 1 year ago

with 12 min for healtcheck wouldn't polling and health check collide again after 5 'rounds'? better use something odd. but you're probably on the right track! there ought to be a pause between both operations that give the zigbee module enough time to process. question is how long that pause needs to be.

patience4711 commented 1 year ago

Yeah, maybe better no automatic healtcheck at all but put a flag up when a polling answer == 0, this indicates that the coordinator may be down. thanks for thinking along.

togoright commented 1 year ago

@togoright yes, maybe too fast. ...

I'm surprised it causes you such problems. Before I wrote the post, to be on the safe side, I downloaded an up-to-date copy of the project and made the changes described. I also made the pollres adjustable, added my format for MQTT and set the refresh time of the info page to 2000ms (otherwise it is reloaded every 100ms for me, which often leads to reboots). I tested this version with my inverter for two days without any problems. Unfortunately I can't make this version available to you, but I will make the changes again, compile it and post it here with the source code. At the same time, I will also flash the version for myself. I don't have much time at the moment, so it will probably take a while.

patience4711 commented 1 year ago

That's oke, you have done your job thoroughly. Now the healtcheck has 2 minutes offset relative to the polling. Problem seems solved so you don't have to take any trouble.

BTW today i refreshed all the files of the project. F.i. the infopage retrieves only once ( onload ) the time.

The first reports on the implementation of your code are positive, great! If on a longer term the crashes stay away we can skip the healthcheck.

Thanks again for this usefull contribution.