luc-github / ESP3DLib

ESP3D library for Marlin and ESP32 boards
GNU General Public License v3.0
98 stars 32 forks source link

[BUG]WifiTask running on core 0 may result in halts during prints #10

Closed vivian-ng closed 4 years ago

vivian-ng commented 4 years ago

This was an issue raised when this was part of a custom Marlin fork.

The wifi task running on the same core as the rest of Marlin may cause random halts during prints when wifi connection is poor. The solution back then was to run wifi task on another core.

I believe esp3dlib.cpp can be amended to run wifi task on core 1 instead of 0.

void Esp3DLib::init()
{
    xTaskCreatePinnedToCore(
        WiFiTaskfn, /* Task function. */
        "WiFi Task", /* name of task. */
        10000, /* Stack size of task */
        NULL, /* parameter of the task */
        1, /* priority of the task */
        NULL, /* Task handle to keep track of created task */
        0    /* Core to run the task */
    );
}
luc-github commented 4 years ago

Marlin run on Core 1 like all arduino Wifi run on on core 0 that is why ESP3DLib run on core 0 to not bother with Marlin

vivian-ng commented 4 years ago

@luc-github Thanks for clarifying. I guess the random halts are caused by something else then. Will close this.

luc-github commented 4 years ago

To clarify: https://github.com/espressif/arduino-esp32/blob/c2b3f2d6afc1911db1974a324782730dd642d524/cores/esp32/main.cpp#L28 and https://github.com/espressif/arduino-esp32/blob/1977370e6fc069e93ffd8818798fbfda27ae7d99/tools/sdk/include/config/sdkconfig.h#L124 #define CONFIG_ARDUINO_RUNNING_CORE 1

luc-github commented 4 years ago

@vivian-ng they are not same random pause that @felixstorm fixed ? they happened wifi off

these new ones happen only when wifi on ?

vivian-ng commented 4 years ago

@luc-github Could this be the problem?

  xTaskCreate(stepperTask, "StepperTask", 10000, nullptr, 1, nullptr);

in i2s_init() of i2s.cpp.

Since the I2S stepper task is not pinned to any core, will it end up running on core 0 and become interfered by wifi?

luc-github commented 4 years ago

you may add xPortGetCoreID(); to know the core used by _https://github.com/MarlinFirmware/Marlin/blob/bugfix-2.0.x/Marlin/src/HAL/HAL_ESP32/i2s.cpp#L142_ so you can see when you have random pause which core is used

static inline uint32_t IRAM_ATTR xPortGetCoreID()

but may need to store in variable as I am not sure the output is already available when init task

felixstorm commented 4 years ago

Generally I would also run all non-Marlin tasks on core 0 (WiFi, ESP3DLib etc.) and run everything Marlin undisturbed on core 1, so also pinning the stepper task to CONFIG_ARDUINO_RUNNING_CORE sounds like a good idea.

@vivian-ng Do you still have any random pauses now with the latest Marlin versions?

vivian-ng commented 4 years ago

@felixstorm I am using quite a new version of Marlin bugfix-2.0.x (fetched about two days back). Usually, no issues with printing, but yesterday, I was having a lot of connections problems on 2.4G wifi (either my router is giving me problems on 2.4G, or my area has a lot of 2.4G interference) and my ESP32 devices (printer and even ESP32-CAMs). I finally managed to get a connection on my Ender-3 running on my MRR ESPE board (with I2S), but it was quite a bad connection (loading webUI took longer time than usual, etc.) When printing, I noticed a lot of pauses. I ended up halting the prints.

@luc-github I think the I2S stepper stream is probably on core 0, since Marlin is already running on core 1, and the i2s_init() is called in HAL_init() which is before HAL_init_board() (where ESP3DLib is called). Maybe can change the line in i2s.cpp to:

xTaskCreatePinnedToCore(stepperTask, "StepperTask", 10000, nullptr, 1, nullptr, CONFIG_ARDUINO_RUNNING_CORE);

The thing is, it is difficult for me to recreate the conditions to test if any solution actually solves this. The issue only occurs when I have problems on my 2.4G wifi, and fortunately, that is not often. Whatever the case, I think I will just push this fix as a PR to Marlin next week (after I test it out a bit to make sure it does not create more problems than solve them).

Maybe this fix will even help resolve the issue with babystepping on I2S. (wishful thinking on my part)

vivian-ng commented 4 years ago

@luc-github Should init() be amended to purposely run WiFiTaskfn on the non-Marlin core?

Something like:

void Esp3DLib::init()
{
    xTaskCreatePinnedToCore(
        WiFiTaskfn, /* Task function. */
        "WiFi Task", /* name of task. */
        10000, /* Stack size of task */
        NULL, /* parameter of the task */
        1, /* priority of the task */
        NULL, /* Task handle to keep track of created task */
        1 - CONFIG_ARDUINO_RUNNING_CORE      /* Core to run the task */
    );
}

which will run on core 0 if CONFIG_ARDUINO_RUNNING_CORE is 1 and core 1 if CONFIG_ARDUINO_RUNNING_CORE is 0.

Anyway, I have submitted PR#16874 to Marlin to explicitly run I2S stepper task on CONFIG_ARDUINO_RUNNING_CORE.

luc-github commented 4 years ago

arduino core is build like this, changing core in sdkconfig means new arduino core and this may break backward compatibility so I really doubt it will happen ever on arduino core, it only may be done using IDF and arduino as component, core for ESP3DLib could be an overidable define yes, but I really doubt of need, anyway it is just 3 lines of code to let full flexibility same as core or not

I will add this possibility later

luc-github commented 4 years ago

@vivian-ng I have added the defines for ESP3DLIb task priority and targeted core https://github.com/luc-github/ESP3DLib/commit/80609a74e901c2e21d10f4ab0b7ce9bad1e17f36

felixstorm commented 4 years ago

@luc-github I have compiled the latest version with your PR already included against a slightly modified version of the Arduino-ESP32 libraries that lets me dump FreeRTOS task statistics.
This is the result after some idle time:

*** FreeRTOS Task Statistics ***
Name             State  Prio  HighWaM  TaskNum  Core    RunT Abs  RunT Perc
esp_timer            B    22     1360        1     0    11549691       <1 %
ipc0                 B    24      556        2     0       54313       <1 %
ipc1                 B    24      492        3     1      201868       <1 %
IDLE0                R     0      384        6     0  2679724309       39 %
IDLE1                R     0      592        7     1     5437467       <1 %
Tmr Svc              B     1     1624        8     0          34       <1 %
loopTask             R     1     6328       12     1  3244688930       47 %
StepperTask          B     1     9468       14     1   136967380        2 %
WiFi Task            B     1     7852       15     0   615283464        9 %
network_event        B    19     2368       16     1        4973       <1 %
eventTask            B    20     1644       17     0        2647       <1 %
tiT                  B    18     1676       18     0    16269359       <1 %
wifi                 B    23     1648       19     0    63836854       <1 %
mdns                 B     1     2052       20     0      544793       <1 %

To me, this looks ok now. ipc1 and IDLE1 should be fine on core 1, then there's loopTask and StepperTask which are both Marlin and then network_event is also on core 1, but that should hopefully not do too much harm.

Should we still encounter pauses due to WiFi issues in the future, I see 2 more viable options:

All options might have side effects, so before implementing anything "just in case" we should IMHO wait if there are really still issues and if so, setup a test environment to duplicate the issues and then try out fixes...

vivian-ng commented 4 years ago

@vivian-ng I have added the defines for ESP3DLIb task priority and targeted core

Thank you! It was just a suggestion because I am not a fan of magic numbers in code.

As for network_event, it seems to be very low load now, and I think it is really only called when something happens (like wifi connection is dropped, etc.). Anyway, I will wait and see if I get random pauses again (need to wait for a bad wifi day).

luc-github commented 4 years ago

the event task will call this function when event is catched : https://github.com/luc-github/ESP3DLib/blob/3cf617bc7c7fa950dcf23195857d4f84bcc0257f/src/wificonfig.cpp#L178-L191

So probability it generate pause is very very low I think