theelims / ESP32-sveltekit

A simple and extensible framework for ESP32 based IoT projects with a feature-rich, beautiful, and responsive front-end build with Sveltekit, Tailwind CSS and DaisyUI. This is a project template to get you started in no time with a fully integrated build chain.
https://theelims.github.io/ESP32-sveltekit/
Other
90 stars 15 forks source link

Crash on MQTTClient error, after new lease #27

Closed SolarDaniel closed 2 months ago

SolarDaniel commented 3 months ago

Hi, this is an awesome peace of code, thank you. Good documentation! I'm not a professional so it took me some time to learn how it works, but now it's great.

I detected some minor issues:

And a big issue:

I have just the Serial monitor and can't debug.

This is not an issue, more a request: When using NTP, there is no event to tell me that the time is valid. I'm using the standard Arduino loop to do some stuff and there I check

tm timeinfo;
if (!getLocalTime(&timeinfo))
{
    Serial.println("Failed to obtain time");
}

It would be nice to have a callback to indicate NTP time is valid.

Thanks Daniel

theelims commented 3 months ago

Hi, glad you can find some use with this project. I'm not a professional software dev either.

@SolarDaniel Also for the ESP crashing I need some more information. Can you please post the output from the serial console with the crash report. And also which ESP flavor you're using? And can you describe how to reproduce the crash?

theelims commented 3 months ago

@SolarDaniel The MQTT issues might be related to https://github.com/theelims/ESP32-sveltekit/issues/25 and https://github.com/theelims/PsychicMqttClient/issues/1. Could you please test if a fresh copy of the MQTT library fixes your issues? platformio --> clean

SolarDaniel commented 3 months ago

Hi, thanks for the quick reply.

The missing c_str() are in following code

void NotificationEvents::begin()
{
    _eventSource.onOpen([&](PsychicEventSourceClient *client) { // client->send("hello", NULL, millis(), 1000);
#ifdef SERIAL_INFO
        Serial.printf("New client connected to Event Source: #%u connected from %s\n", client->socket(), client->remoteIP().toString().c_str());
#endif
    });
    _eventSource.onClose([&](PsychicEventSourceClient *client) { // client->send("hello", NULL, millis(), 1000);
#ifdef SERIAL_INFO
        Serial.printf("Client closed connection to Event Source: #%u connected from %s\n", client->socket(), client->remoteIP().toString().c_str());
#endif
    });
    _server->on(EVENT_NOTIFICATION_SERVICE_PATH, &_eventSource);

    ESP_LOGV("NotificationEvents", "Registered Event Source endpoint: %s", EVENT_NOTIFICATION_SERVICE_PATH);
}

and the missing #ifdef SERIAL_INFO

void NTPSettingsService::onStationModeGotIP(WiFiEvent_t event, WiFiEventInfo_t info)
{
#ifdef SERIAL_INFO
    Serial.println(F("Got IP address, starting NTP Synchronization"));
#endif
    configureNTP();
}

void NTPSettingsService::onStationModeDisconnected(WiFiEvent_t event, WiFiEventInfo_t info)
{
#ifdef SERIAL_INFO
    Serial.println(F("WiFi connection dropped, stopping NTP."));
#endif
    configureNTP();
}

void NTPSettingsService::configureNTP()
{
    if (WiFi.isConnected() && _state.enabled)
    {
#ifdef SERIAL_INFO
        Serial.println(F("Starting NTP..."));
#endif
        configTzTime(_state.tzFormat.c_str(), _state.server.c_str());
    }
    else
    {
        setenv("TZ", _state.tzFormat.c_str(), 1);
        tzset();
        sntp_stop();
    }
}

Concerning the crashes, it's not possible for me to simulate a new lease. I noticed the reboot by the USB sound (I'm on Windows) and the Uptime in the system status. The IP has also changed, so thats why I assume that the issue is there. Once I could see a Guru meditation error pointing to MQTT client, but I didn't copy the output, unfortumately. But it happens frequently when getting a new lease from the router.

I'm using an ESP32 wrover kit1. VScode with PlatformIo newest on Windows 10.

theelims commented 3 months ago

@SolarDaniel All IP addresses should be properly formatted now. Also the missing SERIAL_INFO are included as well.

Concerning the crash I need to be able to reproduce the it in order to fix it. Please try with the latest version of the MQTT library if the issue persists and try your best to give me a copy of the Guru mediation error.

SolarDaniel commented 3 months ago

Thank you for your support, I appreciate it.

I thought about the lease problem. May be I'm wrong. When the ESP reboots cause of a crash and connects to the WiFi again, it could also get a new IP, so that may not be the origin of the crash.

What happens internally, when the router changes the lease? Do I have to actively reconnct to the MQTT server and send the discovery message again to homeassistant? Do the websocket connections stay alive?

When the crash occurs, the USB CDC driver disappears and I can't see any printouts. I'm not shure, but at the beginning of my project I used an external USB-Serial adapter and may be I've seen the Guru meditation error there.

In the Serial Monitor in platformio I can see only printouts done with Serial.print... except all the WiFi stuff. I tried to activate ESP log by adding -D CORE_DEBUG_LEVEL=5 to platformio, did a clean rebuild, but still no log output. So it's quite difficult for me to get more information about the issue. I also obeyed the System status in the GUI. Heap is stable for hours at 112 kB and max allocate is 56 kB.

I concider using another board on which I can debug and have all the debug output. Any suggestions? How do I activate the debug output?

My project is basically your demo project but instead of the light state service a service that reads data on a UART and sends it to homeassistant. I use the standard Arduino loop to read the UART, then from the loop call update, which in turn calls the homeassistread function. I think thats basically correct.

I will try to eliminate any code that is not necessary and just do the basic as described obove. Just call update every 5 seconds and return a simple JSON in homeassistread function.

By the way, I had a look at:

PsychicMqttClient::PsychicMqttClient() : _mqtt_cfg()
{
  memset(&_mqtt_cfg, 0, sizeof(_mqtt_cfg));
}

PsychicMqttClient::~PsychicMqttClient()
{
  disconnect();
  esp_mqtt_client_destroy(_client);
  free(&_mqtt_cfg);
}

_mqtt_cfg is a member of PsychicMqttClient and initialized in the constructor (zeroed). It is NOT allocated by new() and therefore not on the dynamic heap. As a consequence it should not be free'd in the destructor. It fortunately gets never freed cause the destructor gets never called in your framework. But could possibly be in another project. But anyway has irritated me. Or Am I wrong?

theelims commented 3 months ago

If you use a regular ESP32 with an external USB-UART bridge it should not disappear during a crash.

One idea: You may use Serial2 for your serial readout and free Serial again for debugging.

SolarDaniel commented 2 months ago

I ran a long time test and at midnight there was a reboot with following Reset reason (from System Status): "RTC Watch dog reset digital core and rtc module" after 22 hours of running with stable heap, nothing abnormal. No output on Monitor. Normal reconnect from GUI after reload in Browser, but no more output like New client connected to event source .... USB connection seems to be lost. During test, Web interface was not open. Stable WiFi network (1m distance) and Homeassistant running normally. Using a board with external UART bridge. MQTT messages were sent out normally when started (checked with MQTT Explorer). afterreboot too.

Basic architecture like demo project, but using Arduino loop(). main.cpp

void setup()
{
    // start serial and filesystem
    Serial.begin(SERIAL_BAUD_RATE);

    // start ESP32-SvelteKit
    esp32sveltekit.begin();

    // start the meter service
    meterService.begin();
    // load the initial meter settings
    meterMqttSettingsService.begin();
    meterSettingsService.begin();
}

void loop()
{
    // vTaskDelete(NULL);   Delete Arduino loop task, if not needed
    meterService.loop();
    delay(50);
}

MQTT interface, nothing fancy or dangerous

// MQTT interface
// this is called from statefull service after triggering with update() in loop
void MeterState::homeAssistRead(MeterState &settings, JsonObject &root)
{
    root["state"] = settings.enabled ? ON_STATE : OFF_STATE;
}

and in the loop: meterService.loop() Nothing fancy, just trigger the MQTT output every 10 seconds.

void MeterService::loop()
{
    unsigned long currentMillis = millis();
    if (!_lastMillis || (unsigned long)(currentMillis - _lastMillis) >= 10000)
    {
        _lastMillis = currentMillis;
        update([&](MeterState& state)
        {
            _state.enabled = false;
            return StateUpdateResult::CHANGED; // notify StatefulService by returning CHANGED
        }, "meter");
    }
}
theelims commented 2 months ago

@SolarDaniel in my experience the watchdog gets mostly triggered if a task consumes all processor resources so that the idle task can't run.

How do you read the smart meter? Do you use a library for that? The code you posted is likely not triggering the watchdog. If you could catch the output from serial you would get some clues where to look.

SolarDaniel commented 2 months ago

Thank you for the reply. In the final code I will read the UART to get some values. But for the test now I reduced the code to the bare minimum as shown above. It's only the setup code, the loop and nothing else but the homeAssistRead function.

For the moment the ESP32 runs for 19 hours. We'll see.

I have now several other boards to test: 2 ESP32-C3 and a 2 ESP32-C6. Interestingly, the C3 has less RAM, but System Status shows more available heap. And the CPU frequency is only 160 Mhz, single core, but response times are as fast as the ESP32. Core temperature only 30°C compared to 51°C.

I just started the test on a C3 now. I will report the results. I still have to learn, how to get (more) debug output. The C3 has a JTAG on USB built in.

Need to buy an USB hub, do not have enough ports for more tests.

SolarDaniel commented 2 months ago

all-clear With the ESP32-C3 and ESP32-C6 the tests completed successfully for 24 hours. I assume it's a hardware issue on the wroower board, maybe a bad voltage regulator, too small block capacitors or bad grounding layout, whatever. That board definitely goes to the recycling waste. I'm happy with the C3 for now.

Still one question: In the Arduino main.cpp

void loopTask(void *pvParameters)
{
    setup();
    for(;;) {
#if CONFIG_FREERTOS_UNICORE
        yieldIfNecessary();
#endif
        if(loopTaskWDTEnabled){
            esp_task_wdt_reset();
        }
        loop();
        if (serialEventRun) serialEventRun();
    }
}

extern "C" void app_main()
{
#if ARDUINO_USB_CDC_ON_BOOT && !ARDUINO_USB_MODE
    Serial.begin();
#endif
#if ARDUINO_USB_MSC_ON_BOOT && !ARDUINO_USB_MODE
    MSC_Update.begin();
#endif
#if ARDUINO_USB_DFU_ON_BOOT && !ARDUINO_USB_MODE
    USB.enableDFU();
#endif
#if ARDUINO_USB_ON_BOOT && !ARDUINO_USB_MODE
    USB.begin();
#endif
    loopTaskWDTEnabled = false;
    initArduino();
    xTaskCreateUniversal(loopTask, "loopTask", getArduinoLoopTaskStackSize(), NULL, 1, &loopTaskHandle, ARDUINO_RUNNING_CORE);
}

the watchdog is not reset cause it's not enabled. Is the watchdog timer ever used in your framework? I didn't find any calls.