nygma2004 / growatt2mqtt

Growatt Solar Inverter Modbus Data to MQTT Gateway
MIT License
136 stars 33 forks source link

WDT Timeout Issue #26

Closed cwynd closed 1 year ago

cwynd commented 1 year ago

Congrats on a nice project! I am trying to port the code to a ESP32C3 (I have a few, and I've used them in several other projects), and I've commented pretty much everything except the core functionality (no ArduinoOTA, no NeoPixel, no MQTT data send), just some Serial.println() debug messages for now.

I'm trying to talk to one (of a pair) of Growatt SPF3000TL inverters. But if I flash the code (stripped down as above) I get "Guru Meditation Error: Core 0 panic'ed (Interrupt wdt timeout on CPU0)." (error repeated verbatim for future searches). I noticed you mentioned a WDT timeout in one of the videos, which got my attention, so hoping you may have some more insights.

My Timer ISR (same as your master code) calls ReadHoldingRegisters() in the ino, which invokes growattInterface.ReadHoldingRegisters(json) in growattInterface, of which the relevant entry point is:

uint8_t growattIF::ReadHoldingRegisters(char* json) {
  uint8_t result;
  ITimer0.stopTimer();
  result = growattInterface.readHoldingRegisters(setcounter * 64, 64); //<< WDT panic is occurring here
  // result = 0;
  ITimer0.restartTimer();

If I comment the readHoldingRegisters line and uncomment the result=0; it runs fine. Of course there could be several reasons something is timing out there - communications problem with the TTL to RS485 board (mine looks identical to the one you used), failed communications with the Growatt, etc, etc.

I would really like to drill down in to the growattInterface.readHoldingRegisters(setcounter * 64, 64) call to try and figure out in more detail exactly what is timing out, so I can narrow it down, but I'm lost in the overloading of ReadHoldingRegisters (either capitalized or not) in the ino, and also in the growattInterface class. So a specific question: where do I find the code that growattInterface.readHoldingRegisters(setcounter * 64, 64) invokes? grep etc is not helping.

I really like the clean ISR approach to invoking the code, but I'm stuck trying to use it at the moment. Any insights really appreciated!

nygma2004 commented 1 year ago

I had the same issue when I was developing the code for ESP8266. readHoldingRegisters takes a long time, especially on serial communication and reading 64 registers. This is why in my code I had to disable the watchdog timer specifically, because it took more time to read the registers than the watchdog timer triggered. My experience with ESP32 is limited and I am not sure if ITimer0.stopTimer() is sufficient to disable the watchdog timer. I would recommend other examples of disabling the WDT before the modbus operation and re-enabling it again.

cwynd commented 1 year ago

Thanks, it's a good point, the ITimer0.stopTimer() in my code only stops that specific ISR from getting double triggered if it takes too long. But rereading your code, I see that you turned off the interrupt WDT itself. I may have to try that on the ESP32C3, but it may be problematical with other timing critical things going on.

Where's the source for the growattInterface.readHoldingRegisters(setcounter * 64, 64) method? I must be missing something, I can't find it.

On a very rough guesstimate the Growatt modbus dialog must take something like 500mS (64 x 16 bit registers @ 9600, ignoring overhead), so we're looking for relatively a lot of time.

Thanks again

Edit: I just found readHoldingRegisters(...) in ModbusMaster.

cwynd commented 1 year ago

I've been able to get past the WDT timeout issue by refactoring the code as a couple of Tasks on top of the Espressif FreeRTOS core, and I'm using a queue to pass the json from a Growatt Modbus query task to an MQTT post task.

This seems fine so far, and solves the WDT timeout issue, without having to disable the WDT. These tasks run at a lower priority than core processes (WDT etc), and so can take as long as they need. This means though the Modbus code may get interrupted by the core intermittently, so a concern is that this may trip up the serial timing, though I have not seen evidence of that yet, and since it's only 9600 there's hope that it won't. I believe all Espressif SoCs (ESP...) use their version of FreeRTOS under the hood, so this approach might be applicable to the ESP8266 as well.

How did you figure out the SLAVE_ID of the Growatt? I am now getting Invalid SlaveID back from the Modbus code, and with weak documentation* struggling to figure out the valid slave ID, short of taking the time to brute force it.

*Edit: Clarification: 'weak documentation' on Growatt SPF3000 slave ID getting / setting.

nygma2004 commented 1 year ago

The default slave ID is 1. But you can change it on the screen. I checked the documentation, it does not mention "modbus" or "slave id" anywhere, but this is the closest setting I could find: image

cwynd commented 1 year ago

Thanks so much. I added temporary Serial.println()'s for the slaveID that I query and the slave ID sent back from the Growatt in ModbusMaster, and I am getting a changing ID back - on one one occasion I sent 1 and I actually got slaveID 1 back, but on every other occasion I've gotten 2, 3, etc.

So I'm now thinking that the serial comms is not working the way I think it's working for whatever reason (timing inside the FreeRTOS Task?). I've ordered an RS485 to USB dongle, and I'll try talking to the Growatt using mbpoll from my laptop. This should remove at least some of the uncertainty around new code, comms, etc. leaving only the Growatt talking to known code.

nygma2004 commented 1 year ago

That is very interesting. There is CRC check in the modbus communication, so if the serial data is off, the CRC check would fail and the code would give an invalid CRC error. But checking with mbpoll and a USB dongle is a good workaround to check what is wrong.

cwynd commented 1 year ago

It's a good point about the CRC check. I've now had a chance to try the USB dongle, and interestingly I am getting almost identical behavior. So this says to me that at least the immediate problem is not in the code, it is somewhere else (although there are few other things to suspect!).

I do wonder whether 9600 is the correct speed - I know in the master code you commented "do not change", but I'm running out of things to suspect now.

Rather than load up this thread with posts not directly related to the subject code, I have posted a question over here. I won't close this thread yet, pending any learnings from the post linked above, which may help others with implementing the code.

nygma2004 commented 1 year ago

Thanks, weird. Please let me know if you find out the solution. I see no reason why Growatt would do it any other way, but is it possible that the communication is not Modbus? When I see RS485 I always assume Modbus, but it does not necessarily have to be (sorry did not read the entire documentation). Also try using different flow control, like not 8 bits, maybe change the stop bit and parity. And yes you can play around with the baud as well.

cwynd commented 1 year ago

So I figured out the communications.

It seems this Growatt SPF3000TL inverter has to be cold-rebooted when the RS485 Modbus slave ID is updated, otherwise it looks like it listens for the new address, but responds with the old address in the header, resulting in the Modbus master seeing an invalid slave ID. My inverters are in a production setting already, and so I had avoided rebooting them until now.

Also, it was a good suggestion to look at flow control, stop bit, parity etc - I had assumed (my bad) that 8N1 was the default, but it is not the default for mbpoll and it looks like the CRC errors were caused by a serial protocol mismatch.

I've now got far enough that I have read registers back from my Tasks version of the project code and Serial.println-ed the json. So I am moving on to passing the json to the MQTT posting Task, and tidying everything up now.

Thanks again for helpful suggestions!

nygma2004 commented 1 year ago

Wonderful, I am glad that all this worked out. Never thought about the cold-boot issue, good catch.

cwynd commented 1 year ago

Thanks. I was considering whether to generate a PR for this, but the problem is the ESP32 code ends up being pretty different (HardwareSerial, task-based code structure), and when I tried, I ended up adding many compiler switches, so I don't think it's helpful. But having said that, suggestions welcome.