Python thread + gc.collect() = CPU halt

Isopodus commented 5 years ago

Hi. Idk if this project is still supported, but i have got some troubles when using multithreading with gc.collect(). This issue is particulary same as #241, but just I hope to get yout attention. How to reproduce:

import gc
from machine import Pin
def my_thread(args):
    try:
        _thread.allowsuspend(True)
        led = Pin(2, 2)
        while True:
            ntf = thread.getnotification()
            if ntf:
            if ntf == thread.EXIT:
                    return
        elif ntf == thread.SUSPEND:
                while thread.wait() != thread.RESUME:
                    pass
        # Doing some stuff here, e. g. blinking led, no matter in most of cases
            led.value(1)
            time.sleep(0.5)
            led.value(0)
            time.sleep(0.5)
    except Exception as e:
        print(e)
            return

 _thread.start_new_thread('my_awesome_thread', my_thread, ('some args',))
gc.collect()

Expected behaviour Thread runs simultaneously while gc.collect() happens without any errors.

Real behaviour Core panic, backtrace is shown, CPU is halted.

Backtrace

Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC      : 0x400f7de7  PS      : 0x00060130  A0      : 0x800ef2eb  A1      : 0x3ffe5150  
A2      : 0x00000000  A3      : 0x3ffc93a0  A4      : 0x3ffd61c4  A5      : 0x00000000  
A6      : 0x3ffd6190  A7      : 0x000000b1  A8      : 0x800f7d54  A9      : 0x3ffe5130  
A10     : 0x00000000  A11     : 0x3f41175c  A12     : 0x3f413548  A13     : 0x3f413b48  
A14     : 0x3f413b48  A15     : 0x3ffe51e0  SAR     : 0x0000001a  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000008  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff  

Backtrace: 0x400f7de7:0x3ffe5150 0x400ef2e8:0x3ffe51f0 0x400eaea9:0x3ffe5220 0x400f6400:0x3ffe5240 0x400d8815:0x3ffe52e0

CPU halted.

Stranger things

When I was writing this issue i connected my ESP32 to copy latest backtrace. But when I tried to reproduce it, CPU halt did not happened. Than I re-saved same code file, tried to reproduce CPU halt and it happened.

As I suspended tested thread, issue was no longer reproducable, once i tried to resume it ive got similar backtrace:


Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC      : 0x400f7de7  PS      : 0x00060330  A0      : 0x800ef2eb  A1      : 0x3ffe5150  
A2      : 0x00000000  A3      : 0x3ffc9560  A4      : 0x3ffd6ae4  A5      : 0x00000000  
A6      : 0x3ffd6ab0  A7      : 0x000000b1  A8      : 0x800f7d54  A9      : 0x3ffe5130  
A10     : 0x00000000  A11     : 0x3f41175c  A12     : 0x3f413548  A13     : 0x3f413b48  
A14     : 0x3f413b48  A15     : 0x3ffe51e0  SAR     : 0x0000001a  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000008  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0xffffffff

Backtrace: 0x400f7de7:0x3ffe5150 0x400ef2e8:0x3ffe51f0 0x400eaea9:0x3ffe5220 0x400f6400:0x3ffe5240 0x400d8815:0x3ffe52e0

CPU halted.


3. Got CORRUPT HEAP when listing threads running (happened one or two times):

CORRUPT HEAP: multi_heap.c:428 detected at 0x3ffe3268 abort() was called at PC 0x40090407 on core 0

Backtrace: 0x40090c4b:0x3ffdd720 0x40090da3:0x3ffdd740 0x40090407:0x3ffdd760 0x40090762:0x3ffdd780 0x40082718:0x3ffdd7b0 0x4008275c:0x3ffdd7e0 0x40082d81:0x3ffdd800 0x4000beaf:0x3ffdd820 0x40085a29:0x3ffdd840 0x40154877:0x3ffdd860 0x4015ec05:0x3ffdd8a0 0x400890b5:0x3ffdd920

CPU halted.



**Notes**
IDE used: Thonny

**ELF file attached**

[MicroPython.zip](https://github.com/loboris/MicroPython_ESP32_psRAM_LoBo/files/3666550/MicroPython.zip)

carterw commented 5 years ago

Project no longer supported, apparently. I experimented with the thread code recently and it is working for me, a loop not too different from yours. What I saw initially was a stack overflow. When I boosted the stack size ( _thread.stack_size(10*1024) ) things worked fine.

A couple of things you might try;

the default stack size is very small, increase it.
put the "led = Pin(2, 2)" statement prior to the loop so it doesn't happen over and over
put the gc.collect() inside the loop in the thread instead of outside the thread

chmondkind commented 5 years ago

Great advice Bill!

increasing the stack size is what solved my issues with using threads (one of the main reasons I started using this fork of uP in the first case).

Am 29.09.2019 um 17:21 schrieb Bill Carter notifications@github.com:

Project no longer supported, apparently. I experimented with the thread code recently and it is working for me, a loop not too different from yours. What I saw initially was a stack overflow. When I boosted the stack size ( _thread.stack_size(10*1024) ) things worked fine.

A couple of things you might try;

the default stack size is very small, increase it. put the "led = Pin(2, 2)" statement prior to the loop so it doesn't happen over and over put the gc.collect() inside the loop in the thread instead of outside the thread — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Isopodus commented 5 years ago

Great, thanks for the advices, I will try them later today

klauweg commented 5 years ago

I think this error is related to the issue:

The following simple code causes a halted cpu too:

from microWebSrv import MicroWebSrv
import _thread
import gc

mws = MicroWebSrv() # TCP port 80 and files in /flash/www
mws.Start()         # Starts server in a new thread

gc.collect()

results in:

Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC      : 0x400f075a  PS      : 0x00060030  A0      : 0x800eeaa8  A1      : 0x3ffe6200
A2      : 0x00000000  A3      : 0x00000001  A4      : 0x3ffba724  A5      : 0x00000000
A6      : 0x00000000  A7      : 0x00000000  A8      : 0x00000000  A9      : 0x3ffe61e0
A10     : 0x3ffe6240  A11     : 0x00000019  A12     : 0x3ffb6184  A13     : 0x00000000
A14     : 0x00000000  A15     : 0x3ffc51c0  SAR     : 0x00000000  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000

Backtrace: 0x400f075a:0x3ffe6200 0x400eeaa5:0x3ffe6220 0x400faa75:0x3ffe6240 0x400d9b11:0x3ffe62e0

CPU halted.

carterw commented 5 years ago

Could well be. I am invoking the MicroWebCli in a thread and that required a larger stack.

Isopodus commented 5 years ago

Unfortunately, increasing stack size did not help. Any other suggestions?

Isopodus commented 5 years ago

I think this error is related to the issue:

The following simple code causes a halted cpu too:

from microWebSrv import MicroWebSrv
import _thread
import gc

mws = MicroWebSrv() # TCP port 80 and files in /flash/www
mws.Start()         # Starts server in a new thread

gc.collect()

results in:

Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.
Core 1 register dump:
PC      : 0x400f075a  PS      : 0x00060030  A0      : 0x800eeaa8  A1      : 0x3ffe6200
A2      : 0x00000000  A3      : 0x00000001  A4      : 0x3ffba724  A5      : 0x00000000
A6      : 0x00000000  A7      : 0x00000000  A8      : 0x00000000  A9      : 0x3ffe61e0
A10     : 0x3ffe6240  A11     : 0x00000019  A12     : 0x3ffb6184  A13     : 0x00000000
A14     : 0x00000000  A15     : 0x3ffc51c0  SAR     : 0x00000000  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x00000000  LEND    : 0x00000000  LCOUNT  : 0x00000000

Backtrace: 0x400f075a:0x3ffe6200 0x400eeaa5:0x3ffe6220 0x400faa75:0x3ffe6240 0x400d9b11:0x3ffe62e0

CPU halted.

Seems MicroWebSrv uses thread inside, so gc.collect() causes CPU halt too. I wonder why FTP or Telnet server of this firmware do not conflict with gc.collect(). When I list all the threads running, FTP and Telnet show up as SERVICE, MainThread is shown as MAIN, and any Python thread that I try to run is called PYTHON. Maybe we need to find out how to start new thread as SERVICE? Perhaps this won't work, because Telnet and FTP modules are written in C.

klauweg commented 5 years ago

At least the MicroWebServer is useless at the moment. As soon as a garbage collection happens, the cpu is halted. Until now i was unable to reproduce the error with other threads apart from the microwebserver.

romnan commented 4 years ago

Has there been any progress with the issue? Currently this is preventing me from using the loboris port.

Isopodus commented 4 years ago

Sadly I got no progress with it, reply if you will get any good results

curlyz commented 4 years ago

Just go with asyncio.

ijustwant commented 4 years ago

Maybe the watchdog needs to be fed in the loop...

romnan commented 4 years ago

asyncio was super slow when I tested it (communication via sockets). But may have been due to my lack of experience with it. Using the pycom firmware, threads are more then 10 times faster.

curlyz commented 4 years ago

Asyncio is a collaborative threading, therefore, you need to tune every thread that make it work together nicely. Btw, may I ask where did you get the pycom firmware ?

romnan commented 4 years ago

Here is the link to the latest Firmware: https://software.pycom.io/findupgrade?product=strict=true&pycom-firmware-updater&type=stable&platform=win32&redirect=true

loboris / MicroPython_ESP32_psRAM_LoBo

Python thread + gc.collect() = CPU halt #308