stuartpittaway / diyBMSv4ESP32

diyBMS v4 code for the ESP32 and new controller hardware
Other
181 stars 80 forks source link

Upload Config but Failed to save settings #290

Open Michi202020 opened 4 months ago

Michi202020 commented 4 months ago

Hello Stuart,

i have the problem that i can not restore my backup config. I have upload your last 3 Firmware from 27.12.2023 but in all i can not restore after upload my config. I upload my old config also i check it withe the same config what i store before i install the new firmware. i also check it with chrome, edge and more but all say to the upld file the failed to save settings. Only i save the config and on the same time i restor than he make it. What happen?

Best Regards Michael .G

stuartpittaway commented 4 months ago

Can you upload the configuration file to this issue please and I can test on my system.

Michi202020 commented 4 months ago

HelloStuart, here is my file, but i have a system what habe 3 Controller and all is the same proble but today i mak over the air a update from https://github.com/stuartpittaway/diyBMSv4ESP32/files/14179664/diybms_controller_firmware_experimental_bug276.zip and in this version i can upload all config files. I have also from #276 problem withe disconnect the wifi that is whay i upload your fix version today. But i have problems also in this version 6 februar 2024 withe 3 battery system on the same place. if i disconnect the wifi than 2 from 3 batterys are restarting and after this is the soc 10% and he make it all 2-4 hours. what can we do? __cfg_20240525_094723.json wlan probleme reboot diybms 27.12.23.txt

Michi202020 commented 4 months ago

IMG_20240526_090842 Here is the Battery System

Michi202020 commented 4 months ago

Hello Stuart, today i test it withe your stateofhealth and i compile it and flashed it but you see also my test withe the wlan. for testing i used my phone and i switch on and of my wifi but on off is not the biggest problem if i go with my phone far away and come bag than i see the crash faster. See log output_2024-05-26_18-01-32.log

jetronic18s commented 4 months ago

The software from the “stateofhealth” branch then had no problem switching the WLAN on and off? If it's just the range problem, would a small, cheap repeater be a workaround for you?

Stuart has also created a pull request from “stateofhealth”, there may be a new release soon. I would also test again, when installing the new software I would use the “erase-all” option for the ESP32.

Michi202020 commented 4 months ago

Hello, yes also in this version I become a restart from the cpu if it reconnect to wifi or disconnect. Not in all situation and all time. It can run 15 hours without problem or only 3 min. The wifi signal is round about - 70dbm. In Version 27.11.23 I have on the same time all 3 controllers have restarted if I disconnect my wifi repeater for example. At home I test it withe a test controller withe my phone as wifi ap for checking how version is good for me. I don't need wifi. For me it's enough that the controller have a access point for config and after this he push all Infos over bms to inverter. I need only a stable version without wifi connection problems. The 3 Battery's what do you see on my picture I have connect all withe my who e Hotspot and after this I switch my Hotspot off and from this time it works good for 1 day but it's not the best final version.

stuartpittaway commented 4 months ago

Hi @Michi202020

How are you powering the ESP32 controller boards? I've seen the ESP reboot if the WIFI signal is poor as it pulls a large current from the power supply to try and connect to the access point.

The "stateofhealth" version also saves the state of charge into flash memory so after a reboot it will recover the last known value - that should fix your problem of losing SoC after a reboot.

Does the configuration restore correctly on the "stateofhealth" version?

Michi202020 commented 4 months ago

Hy Stuart,

Hi @Michi202020

How are you powering the ESP32 controller boards? I've seen the ESP reboot if the WIFI signal is poor as it pulls a large current from the power supply to try and connect to the access point.

The "stateofhealth" version also saves the state of charge into flash memory so after a reboot it will recover the last known value - that should fix your problem of losing SoC after a reboot.

Does the configuration restore correctly on the "stateofhealth" version?

Hello Stuart,

i have a step down supply withe max 3a and nominal 1,5a. my meter have a voltage from connecting the wifi round about 15mv diffent from 5,1v. all my supplyer i setup to 5,1v not 5v. i also test it withe my laborsupplyer from 5,1v up to 4,3v and look what make the script but its not a different to the reboot the controller. i need round about 10 pcs from connect and disconnect from my hotspot to generate the error. This error is from 5,1v system. See below. output_2024-05-28_17-46-43.log

Michi202020 commented 4 months ago

The first one! `I (325707) diybms-web: httpd_stop E (325708) diybms: Failed to queue CANBUS message (0x107) I (325712) diybms: CAN STATUS: rx-q:0, tx-q:6, rx-err:0, tx-err:0, arb-lost:0, bus-err:0, state: RUNNING D (325733) diybms: Task 3, s=0 e=11 I (325808) diybms: stop mdns I (325820) diybms: WIFI connect quick retry 1 Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.

Core 0 register dump: PC : 0x401b837a PS : 0x00060d30 A0 : 0x801b845f A1 : 0x3ffe0f10
A2 : 0x3ffb6274 A3 : 0xffffffff A4 : 0x00000000 A5 : 0xffffffff
A6 : 0x00000000 A7 : 0x3fff0c48 A8 : 0x3ffe2ab8 A9 : 0x3ffe0e80
A10 : 0x0000000a A11 : 0x00000001 A12 : 0x3ffef9a4 A13 : 0x3ffef9a4
A14 : 0x3ffeb8f0 A15 : 0x00000000 SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0x0000000a LBEG : 0x4008c221 LEND : 0x4008c231 LCOUNT : 0xfffffffe

Backtrace: 0x401b8377:0x3ffe0f10 0x401b845c:0x3ffe0f60

ELF file SHA256: 2de7bfb4bfa9abf8

Rebooting... ets Jun 8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)`

and today!!!!!!

I (100365) diybms: CAN STATUS: rx-q:0, tx-q:6, rx-err:0, tx-err:0, arb-lost:0, bus-err:0, state: RUNNING E (100618) diybms: Failed to queue CANBUS message (0x107) I (100618) diybms: CAN STATUS: rx-q:0, tx-q:6, rx-err:0, tx-err:0, arb-lost:0, bus-err:0, state: RUNNING I (100874) diybms: WIFI_EVENT_STA_DISCONNECTED I (100874) diybms: ShutdownAllNetworkServices I (100874) diybms-web: httpd_stop E (100971) diybms: Failed to queue CANBUS message (0x107) I (100971) diybms: CAN STATUS: rx-q:0, tx-q:6, rx-err:0, tx-err:0, arb-lost:0, bus-err:0, state: RUNNING I (100975) diybms: stop mdns I (100989) diybms: WIFI connect quick retry 1 Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.

Core 0 register dump: PC : 0x401b837a PS : 0x00060030 A0 : 0x801b845f A1 : 0x3ffe0f10
A2 : 0x3ffb6274 A3 : 0xffffffff A4 : 0x00000000 A5 : 0xffffffff
A6 : 0x00000000 A7 : 0x3ffec1a8 A8 : 0x3ffe2ab8 A9 : 0x3ffe0e80
A10 : 0x00000000 A11 : 0x00000001 A12 : 0x3ffeb978 A13 : 0x3ffeb978
A14 : 0x3ffec178 A15 : 0x3ffec1b2 SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000 LBEG : 0x4008c221 LEND : 0x4008c231 LCOUNT : 0xfffffffe

Backtrace: 0x401b8377:0x3ffe0f10 0x401b845c:0x3ffe0f60

ELF file SHA256: 2de7bfb4bfa9abf8

Rebooting... ets Jun 8 2016 00:22:57

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT) configsip: 0, SPIWP:0xee clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00 mode:DOUT, clock div:2 load:0x3fff0030,len:1184 load:0x40078000,len:13232 load:0x40080400,len:3028 entry 0x400805e4 [ 58][D][esp32-hal-cpu.c:244] setCpuFrequencyMhz(): PLL: 480 / 2 = 240 Mhz, APB: 80000000 Hz I (77) diybms:

           _          __

| o |) |\/| ( (| | \/ |_) | | __)

stuartpittaway commented 4 months ago

Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.

Yep, thats the error most people report - another issue for that here... https://github.com/stuartpittaway/diyBMSv4ESP32/issues/276

I don't have a solution for that problem at the moment.

Michi202020 commented 4 months ago

OK I found this in the internet over this problem. Is this a help for you?

https://www.esp32.com/viewtopic.php?t=19135

Michi202020 commented 4 months ago

Hello Stuart,

i used chat gpt for this error and this is the answere and in my case the the position in the elf file.

The error "Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled" on an ESP32 or similar microcontroller often indicates an invalid memory access, such as reading from or writing to an unassigned memory address.

Here are some steps you can take to diagnose and resolve the issue:

  1. Analyze the Backtrace The backtrace provides clues about where the error occurred. In your case, the relevant line is:

makefile Code kopieren Backtrace: 0x401b8377:0x3ffe0f10 0x401b845c:0x3ffe0f60 The addresses 0x401b8377 and 0x401b845c correspond to locations in the program code. To translate these addresses into source code lines, you need to use the debug symbols generated during compilation.

  1. Decode the Backtrace You can use the tool xtensa-esp32-elf-addr2line to translate the addresses into source code lines. This tool is part of the ESP32 toolchain.

Take the address 0x401b8377 and run the following command:

Code kopieren xtensa-esp32-elf-addr2line -pfia -e your_program.elf 0x401b8377 Do the same for the address 0x401b845c.

  1. Debugging in the Code The translated addresses will give you hints about which lines of code caused the error. Check these lines for potential issues such as:

Accessing uninitialized pointers or memory regions. Array accesses beyond their bounds. Using objects after they have been freed (e.g., after a free or delete call).

  1. Stack Trace and Register Information Some information from the register dump can also be useful:

EXCVADDR: 0x00000000 indicates that the exception was caused by accessing address 0x00000000 (NULL pointer). EXCCAUSE: 0x0000001c stands for LoadProhibited, indicating a prohibited load access.

  1. Common Causes and Solutions Some common causes for such errors are:

Null Pointer Dereference: Ensure all pointers are initialized and not NULL before use. Stack Overflow: Increase the stack size of your task if you are using an RTOS (e.g., FreeRTOS). Invalid Memory Access: Ensure all array accesses are within bounds.

  1. Example Code to Check Here's a small example to illustrate:

c Code kopieren void faulty_function() { int ptr = NULL; ptr = 42; // This will cause a LoadProhibited exception }

void app_main() { faulty_function(); } In this example, the error is caused by dereferencing a NULL pointer. Make sure such errors are not present in your code.

Conclusion By decoding the backtrace and closely inspecting the relevant lines of code, you can identify and fix the cause of the Guru Meditation Error. Ensure that all memory accesses are valid and all pointers are properly initialized. Debugging tools and systematic troubleshooting will help you efficiently resolve the error.

C:\Users\Admin\Downloads\diyBMSv4ESP32-stateofhealth\diyBMSv4ESP32-stateofhealth\ESPController.pio\build\esp32-devkitc> C:\Users\Admin\Downloads\diyBMSv4ESP32-stateofhealth\diyBMSv4ESP32-stateofhealth\ESPController.pio\build\esp32-devkitc>xtensa-esp32-elf-addr2line -pfia -e diybms_controller_firmware_espressif32_esp32-devkitc.elf 0x401b8377 0x401b8377: handler_execute at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_event/esp_event.c:139 (inlined by) esp_event_loop_run at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_event/esp_event.c:590

C:\Users\Admin\Downloads\diyBMSv4ESP32-stateofhealth\diyBMSv4ESP32-stateofhealth\ESPController.pio\build\esp32-devkitc>

Is this interesting for you?

Michi202020 commented 4 months ago

soryy i am not a programmer but i think thats interesting!

Based on the detailed analysis, the LoadProhibited error occurs within the handler_execute function, which is called by esp_event_loop_run. Let's focus on ensuring the safety and validity of the event handler operations within these functions. Here are the enhanced steps to troubleshoot and resolve this issue:

Steps to Resolve the Issue Check handler_execute Implementation: Ensure the handler_execute function is safe and properly checks for invalid pointers.

Add Robust Null Checks: Make sure all pointers are checked before dereferencing.

Increase Logging: Add detailed logging to capture the state of the execution and identify where the error occurs.

Example: Adding Null Checks and Logging handler_execute Function First, let's ensure handler_execute has adequate null checks:

c Code kopieren static void handler_execute(esp_event_loop_instance_t loop, esp_event_handler_node_t handler, esp_event_post_instance_t post) { if (handler == NULL || loop == NULL) { ESP_LOGE(TAG, "handler_execute called with NULL handler or loop"); return; }

// Assuming handler->handler is the actual function to call
if (handler->handler != NULL) {
    handler->handler(handler->arg, post.base, post.id, post.data);
} else {
    ESP_LOGE(TAG, "handler function pointer is NULL");
}

ifdef CONFIG_ESP_EVENT_LOOP_PROFILING

int64_t diff = esp_timer_get_time() - post.timestamp;

if (xSemaphoreTake(loop->profiling_mutex, portMAX_DELAY) == pdTRUE) {
    ESP_LOGI(TAG, "Profiling handler execution time.");

    esp_event_loop_node_t* loop_node;
    esp_event_handler_node_t* handler_node;
    SLIST_FOREACH(loop_node, &(loop->loop_nodes), next) {
        SLIST_FOREACH(handler_node, &(loop_node->handlers), next) {
            if (handler_node == handler) {
                handler->invoked++;
                handler->time += diff;
            }
        }
    }

    xSemaphoreGive(loop->profiling_mutex);
} else {
    ESP_LOGE(TAG, "Failed to obtain profiling mutex.");
}

endif

} esp_event_loop_run Function Ensure the calling code in esp_event_loop_run handles pointers correctly and logs relevant information:

c Code kopieren while (xQueueReceive(loop->queue, &post, ticks_to_run) == pdTRUE) { ESP_LOGI(TAG, "Event dequeued: base=%s, id=%"PRIu32, post.base, post.id);

// The event has already been unqueued, so ensure it gets executed.
if (xSemaphoreTakeRecursive(loop->mutex, portMAX_DELAY) == pdTRUE) {
    loop->running_task = xTaskGetCurrentTaskHandle();

    bool exec = false;

    esp_event_handler_node_t *handler, *temp_handler;
    esp_event_loop_node_t *loop_node, *temp_node;
    esp_event_base_node_t *base_node, *temp_base;
    esp_event_id_node_t *id_node, *temp_id_node;

    SLIST_FOREACH_SAFE(loop_node, &(loop->loop_nodes), next, temp_node) {
        // Execute loop level handlers
        SLIST_FOREACH_SAFE(handler, &(loop_node->handlers), next, temp_handler) {
            if (handler != NULL) {
                ESP_LOGI(TAG, "Executing loop level handler at %p", (void*)handler);
                handler_execute(loop, handler, post);
                exec |= true;
            } else {
                ESP_LOGW(TAG, "Null handler encountered at loop level");
            }
        }

        SLIST_FOREACH_SAFE(base_node, &(loop_node->base_nodes), next, temp_base) {
            if (base_node->base == post.base) {
                // Execute base level handlers
                SLIST_FOREACH_SAFE(handler, &(base_node->handlers), next, temp_handler) {
                    if (handler != NULL) {
                        ESP_LOGI(TAG, "Executing base level handler at %p", (void*)handler);
                        handler_execute(loop, handler, post);
                        exec |= true;
                    } else {
                        ESP_LOGW(TAG, "Null handler encountered at base level");
                    }
                }

                SLIST_FOREACH_SAFE(id_node, &(base_node->id_nodes), next, temp_id_node) {
                    if (id_node->id == post.id) {
                        // Execute id level handlers
                        SLIST_FOREACH_SAFE(handler, &(id_node->handlers), next, temp_handler) {
                            if (handler != NULL) {
                                ESP_LOGI(TAG, "Executing id level handler at %p", (void*)handler);
                                handler_execute(loop, handler, post);
                                exec |= true;
                            } else {
                                ESP_LOGW(TAG, "Null handler encountered at id level");
                            }
                        }
                        // Skip to next base node
                        break;
                    }
                }
            }
        }
    }

    esp_event_base_t base = post.base;
    int32_t id = post.id;

    post_instance_delete(&post);

    if (ticks_to_run != portMAX_DELAY) {
        end = xTaskGetTickCount();
        remaining_ticks -= end - marker;
        // If the ticks to run expired, return to the caller
        if (remaining_ticks <= 0) {
            xSemaphoreGiveRecursive(loop->mutex);
            break;
        } else {
            marker = end;
        }
    }

    loop->running_task = NULL;

    xSemaphoreGiveRecursive(loop->mutex);

    if (!exec) {
        ESP_LOGD(TAG, "No handlers registered for event %s:%"PRIu32" posted to loop %p", base, id, loop);
    }
} else {
    ESP_LOGE(TAG, "Failed to take recursive mutex");
}

}

return ESP_OK; Conclusion By adding robust null checks and extensive logging, you can identify where the invalid memory access is occurring. This should help in resolving the LoadProhibited exception. If the issue persists, the detailed logs will provide more information on the state of the system when the error occurs, which will be crucial for further debugging.