micropython / micropython

MicroPython - a lean and efficient Python implementation for microcontrollers and constrained systems
https://micropython.org
Other
19.36k stars 7.75k forks source link

ESP32 without psram. Socket and Memory Split or ... . #14421

Open straga opened 6 months ago

straga commented 6 months ago

Checks

Port, board and/or hardware

ESP32 withou PSRAM / ESP32-C3

MicroPython version

Micropython 23.0-preview.346.g64f28dc1e on 2024-05-03

Reproduction

>>> gc.collect()
>>> micropython.mem_info()
stack: 704 out of 15360
GC: total: 112000, used: 71008, free: 40992, max new split: 21504
 No. of 1-blocks: 979, 2-blocks: 263, max blk sz: 142, max free sz: 133
>>> s1=socket.socket()
>>>
>>> gc.collect()
>>> micropython.mem_info()
stack: 704 out of 15360
GC: total: 112000, used: 71040, free: 40960, max new split: 20480
 No. of 1-blocks: 977, 2-blocks: 265, max blk sz: 142, max free sz: 133
>>> s2=socket.socket()
>>>
>>> gc.collect()
>>> micropython.mem_info()
stack: 704 out of 15360
GC: total: 112000, used: 71072, free: 40928, max new split: 20480
 No. of 1-blocks: 977, 2-blocks: 266, max blk sz: 142, max free sz: 133
>>> s3=socket.socket()
>>>
>>> gc.collect()
>>> micropython.mem_info()
stack: 704 out of 15360
GC: total: 131968, used: 75584, free: 56384, max new split: 248
 No. of 1-blocks: 976, 2-blocks: 268, max blk sz: 282, max free sz: 966
>>> s4=socket.socket()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 105] ENOBUFS

Expected behaviour

No response

Observed behaviour

However, when I attempt to create an additional socket for testing purposes, I encounter an error: OSError: [Errno 105] ENOBUFS. Following this error, the WiFi functionality ceases to work.

Additional Information

Any guidance on how to resolve this issue would be greatly appreciated.

straga commented 6 months ago

“It appears that the issue arises when an asynchronous function is invoked in a synchronous manner. This behavior is particularly noticeable when there’s a significant delay(some time.sleep) in the code execution or when a piece of code takes an extended period to complete its operation in synchronous mode. It seems that asyncio has an impact on these scenarios.”

straga commented 6 months ago

Also

Traceback (most recent call last):
  File "asyncio/core.py", line 1, in run_forever
  File "asyncio/core.py", line 1, in run_until_complete
  File "asyncio/core.py", line 1, in wait_io_event
OSError: [Errno 5] EIO

CleanShot 2024-05-05 at 09 32 32@2x

straga commented 5 months ago

I am try without thread same result. from gc and mcropython shows memory enought. CleanShot 2024-05-07 at 17 05 19@2x

straga commented 5 months ago

When a call is executed within a thread, an error occurs - like that. However, when the call is executed outside of a thread, not that errors. But the WiFi functionality is compromised. Despite showing a connected status, data transmission is not occurring. Neither sending nor receiving of data is happening, and the ping operation is also failing.

CleanShot 2024-05-07 at 17 21 04@2x

CleanShot 2024-05-07 at 17 29 36@2x

projectgus commented 5 months ago

MicroPython is consuming all of the available memory in the ESP32 for its heap, and ESP-IDF is running out of memory for allocating new sockets. After memory has run out, it's likely the Wi-Fi will also stop working as it regularly allocates and frees buffers.

You can see this happening at the moment the "GC: total" value goes up in mem_info output, and "max new split" drops to a very low number. "max new split" is the largest free memory block that ESP-IDF can use to allocate buffers for sockets and Wi-Fi.

You can also call esp32.idf_heap_info() to confirm this.

MicroPython only grows its heap when it needs to avoid a MemoryError. It looks like your application's total memory usage is still pretty low, but maybe at some point your code allocates a large single buffer and fragmentation means it has to grow the heap for this. If you can find the places in your MicroPython code that do these allocations and either remove them, or allocate the large buffer early in your program and then reuse it, then probably you can prevent MicroPython from growing the heap and then the other errors will go away.

If you're not sure what is causing MicroPython to grow the heap, add some more micropython.mem_info() calls in your code and look for whatever makes the "GC: total" number go up.

straga commented 5 months ago

It turns out that ESP-IDF does not have enough memory for wifi operation. Wifi stops working, but sta.isconnected() == True. Ping timeout from PC to ESP32. Everyone thinks everything is fine. Asyncio stream keeps writing as there are no errors. Only if you try to create new sockets, asyncio crashes with Error 5 EIO.

Push Ctrl+c stop asyncio.

micropython.mem_info()
stack: 704 out of 15360
GC: total: 143936, used: 106624, free: 37312, max new split: 448
 No. of 1-blocks: 1550, 2-blocks: 407, max blk sz: 282, max free sz: 1111

gc.collect()

micropython.mem_info()
stack: 704 out of 15360
GC: total: 143936, used: 106752, free: 37184, max new split: 448
 No. of 1-blocks: 1556, 2-blocks: 408, max blk sz: 282, max free sz: 1111

That look same - (https://github.com/micropython/micropython/issues/12819)

Try build with CONFIG_LWIP_TCP_MSL=6000.

Result: CleanShot 2024-05-11 at 09 45 14@2x

build with

LWIP

CONFIG_LWIP_TCP_MSL=6000 CONFIG_LWIP_SO_LINGER=y and use awrite:

async def awrite(writer, data,  b=False):

    gc.collect()
    log.info(micropython.mem_info())

    try:
        if isinstance(data, str):
            data = data.encode('utf-8')
        await asyncio.wait_for(writer.awrite(data), timeout=1)
    except Exception as e:
        log.debug("Error: write: {}".format(e))
        pass

Stop frame: result -> ping timeout and ... .

CleanShot 2024-05-11 at 11 49 51@2x

CleanShot 2024-05-12 at 15 06 53@2x

projectgus commented 5 months ago

@straga It might be related to the linked issue, but the root cause of ESP-IDF running out of memory is that MicroPython has already moved all the memory into the "Python heap".

There are two separate heaps, Python heap and ESP-IDF heap. Even if you have free memory in the Python heap, ESP-IDF can't use it. So to prevent this issue, you need to stop the "GC: total: ..." number from ever increasing. Once this memory is added to the Python heap, it's no longer available for ESP-IDF even if it's free for Python...

straga commented 5 months ago

@projectgus I think it is necessary to return back that the buffer for the board without PS RAM is static, then we will have a little less RAM, but wi-fi will not fall off and everything will work as before.

There will be 20 kilobytes less RAM but everything will work as before.

Or it is necessary to realise this redistribution in some other way.

projectgus commented 5 months ago

@straga MicroPython is growing the heap automatically as your code is running, to prevent a MemoryError. From your logs:

GC: total: 112000,

This is early, there is enough other RAM for ESP-IDF.

GC: total: 143936,

This is later, not enough other RAM for ESP-IDF.

If you find the function in your code that causes this number to increase and refactor it, then ESP-IDF will start working again.

We might be able to add an option to MicroPython to make this simpler, as it's hard for MicroPython to know if it should choose to throw a MemoryError or to grow the heap.

straga commented 5 months ago

@projectgus Thanks for information. I am rewrite my code for use less usage ram as possible. Now works better. While free RAM enought for python and esp-idf, all right. Micropython got first all RAM or more agresive method. If micropython not shows not enought allocated RAM it ok for micropython. But when for esp-idf need more RAM and not free. ESP-IDF not print any just randomly can stop something (in my case wifi stack). Right ?

projectgus commented 5 months ago

@straga Yes, that's how it is at the moment. Glad that you got everything working.

straga commented 3 weeks ago

Using the ESP32-C3, after applying a patch and opening a new socket, the board freezes completely. The only way to recover is by pressing the hardware reset button or enabling the watchdog.

This issue occurs on different boards as well, suggesting it’s not just related to RAM but could involve other factors too.

The ping stops as a result of actions I take. At that moment, I open another socket connection to the board. Despite this, the board remains connected to MQTT and continues sending messages.

However, when I start using multiple active socket connections (Telnet, FTP, HTTP), the board freezes.

That watch dog working. CleanShot 2024-10-11 at 14 44 23@2x

projectgus commented 2 weeks ago

@straga Do you have a way for us to reproduce this hang?

projectgus commented 2 weeks ago

This fix may be relevant to the problem you're seeing https://github.com/micropython/micropython/pull/16015 (although unclear without a way to reproduce.) EDIT: Not this one, missed you weren't using TLS.

projectgus commented 2 weeks ago

The fix from https://github.com/micropython/micropython/pull/15952 may help with this issue. Please try the latest nightly build v1.24.0-preview.409.g82e69df33 (2024-10-10) .bin from the downloads, or any newer version.

straga commented 2 weeks ago

ESP32 board Using the correct path seems to improve, but I’m still experiencing random freezes when running multiple components simultaneously: mqtt_as, uftpd, and telnet for REPL. If I use only telnet and FTP without running any asyncio code or mqtt_as, the system works without issues.

However, when all components are active (which only involves up to six sockets), I encounter random freezes. The situation has improved compared to before, as the watchdog now restarts the board when it freezes.

•   Left side: An asyncio task feeds the watchdog and shows memory info every second.
•   Right side: A ping is sent to the board.

When I attempt to connect to the board and download a 3KB file, the ping drops, and the left side freezes. You can see this behavior in this video: https://youtu.be/-Vujb0btDwc .

CleanShot 2024-10-15 at 11 01 37@2x

If I run the asyncio code in a separate thread, you can observe the behavior in this video: https://youtu.be/MfxXQfAYu9g. It demonstrates how memory leaks occur due to the split between the split new and MicroPython, but the system doesn’t freeze—it continues running. The code remains unchanged, with the only difference being that asyncio is now executed in a thread.

When running asyncio in a thread, the Wi-Fi occasionally stops working, even though ifconfig still shows an IP, and isconnected() == True. This behavior also seems to occur randomly.

CleanShot 2024-10-15 at 11 06 51@2x

straga commented 2 weeks ago

@straga Do you have a way for us to reproduce this hang?

@projectgus

just in secret file set (ssid password), in board.yml mqtt (ip). Wait while loaded and connected to mqtt. connect to telnet and try dowloaded "main.py" from board for example.

main.py TREAD = True # Run in tread, False # Not in thread

ftp client setting: pycharm CleanShot 2024-10-15 at 11 36 02@2x

board files: CleanShot 2024-10-15 at 11 40 09@2x

for_esp32_board.zip

https://github.com/cpopp/MicroTelnetServer https://github.com/robert-hh/FTP-Server-for-ESP8266-ESP32-and-PYBD/blob/master/uftpd.py

projectgus commented 2 weeks ago

@straga Does it still hang on the latest nightly build? See https://github.com/micropython/micropython/issues/14421#issuecomment-2412937043

straga commented 2 weeks ago

@projectgus That was: CleanShot 2024-10-16 at 07 40 08@2x

dpgeorge commented 2 weeks ago

@straga There shouldn't be a "dirty" label in the tag if you used the firmware from the download page.

Did you build yourself with modifications?

straga commented 2 weeks ago

@dpgeorge Yes with my board configuration.

Now same with: fw: v1.24.0-preview.447.g838f21298 (2024-10-15) .bin

video: https://youtu.be/3zo4YU3zZ-w

Only comment binary sensor: If uncomment - wifi not work after loaded. Memory minimum but nobody say about that. https://youtu.be/9S36EsN63rA CleanShot 2024-10-16 at 09 13 05@2x

That with night build: Result same after loaded all, Try use FTP upload or download file. Freeze all, just wait watchdog reset. CleanShot 2024-10-16 at 09 07 06@2x