latency control (do_gc = True) feature - thoughts

beyonlo commented 1 year ago

Hello @peterhinch

I see that you added this fetaure: March 2022: Add latency control for hosts with SPIRAM.

I'm working just with nanogui, but I want to use microgui in the future - good work :)

This code below is from here

    async def _garbage_collect(self):
        n = 0
        while Screen.do_gc:
            await asyncio.sleep_ms(500)
            gc.collect()
            if hasattr(gc, 'threshold'):
                gc.threshold(gc.mem_free() // 4 + gc.mem_alloc())
            n += 1
            n &= 0x1F
            _vb and (not n) and print("Free RAM", gc.mem_free())

I don't know if you know, but the cost of time to run gc.mem_alloc() or gc.mem_free() is *near time than run gc.collect(). *near depends of port/platform, but some times is more.

Well, if you run gc.threshold(gc.mem_free() // 4 + gc.mem_alloc()) every (500ms), so you have the time of gc.collect() + the time of gc.mem_free() + the time of gc.mem_alloc(). So, in my perception, doing that, you have ~200% more cost time than if you run only the gc.collect(). Am I correct? If yes, in my opinion, maybe a option is consider to do the threshold calc and run gc.threshold() not every time, but just one time in some place with a default/or configured by user (need to think better).

Look below that using the ESP32-S3 with no SPIRAM (just internal), the gc.collect() is ~2ms, the gc.mem_free() is ~5ms and the gc.mem_alloc() is ~5ms. And when set the gc.threshold() the time used is ~9ms. So in case that use ESP32-S3 chip the cost time of that loop with gc.threshold() is ~500% more than just gc.collect(), because every 500ms loop have cost time of ~11ms (9 +2), and if does only the gc.collect() the cost time will be ~2ms.

>>> import os
>>> os.uname()
(sysname='esp32', nodename='esp32', release='1.19.1', version='v1.19.1-994-ga4672149b on 2023-03-29', machine='ESP32S3 module with ESP32S3')
>>> import time
>>> start_time = time.ticks_ms(); gc.collect(); end_time = time.ticks_ms(); print(f'Total time to gc.collect(): {time.ticks_diff(end_time, start_time)}ms')
Total time to gc.collect(): 2ms
>>> 
>>> start_time = time.ticks_ms(); mem = gc.mem_free(); end_time = time.ticks_ms(); print(f'mem_free: {mem}, Total time: {time.ticks_diff(end_time, start_time)}ms')
mem_free: 162848, Total time: 5ms
>>> 
>>> start_time = time.ticks_ms(); mem = gc.mem_alloc(); end_time = time.ticks_ms(); print(f'mem_alloc: {mem}, Total time: {time.ticks_diff(end_time, start_time)}ms')
mem_alloc: 3328, Total time: 5ms
>>> 
>>> start_time = time.ticks_ms(); res = gc.mem_free() // 4 + gc.mem_alloc(); end_time = time.ticks_ms(); print(f'Total time to (gc.mem_free() // 4 + gc.mem_alloc()): {time.ticks_diff(end_time, start_time)}ms')
Total time to (gc.mem_free() // 4 + gc.mem_alloc()): 9ms
>>>

Thank you very much for your attention!

peterhinch commented 1 year ago

Thanks for the comments. There are two options I can see:

Remove the threshold setting.
Cause it to occur infrequently, such as every N iterations.

Do you think setting the threshold serves a useful purpose?

beyonlo commented 1 year ago

Hello @peterhinch

Remove the threshold setting.

Cause it to occur infrequently, such as every N iterations.

I consider number 1 the best option. The reason is the same of answer for the next question.

Do you think setting the threshold serves a useful purpose?

In my opinion it is very useful in some situations, but thinking as global solution of all applications running together, not just about the GUI app for example, even that you set the threshold() as a fixed value (without using gc.mem_free()/gc.mem_alloc()) to not have much cost time. If one app (not the GUI) is allocating much RAM (but less than GUI) and the microcontroller has very little RAM, maybe user need to set a smaller global threshold than are setting on the GUI. So, there are different situations where in my vision the best option is that user know about all applications running and set manually the threshold() in one, or more specific points of code, if necessary. In the mostly cases, user will never need to set the gc.threshold(). I see two situations where use it is very useful:

Some application is allocating more RAM in a shot than there is available, that's will cause memory error. So user need to set a global gc.threshold()
The gc.collect() is using more time than you want for your solution. So if you set for example gc.threshold(4000), each 4000 bytes allocated, the gc.collect() will be called, and the time to collect 4000 bytes is a good time for your application. That's is especially good for SPIRAM, but very useful too in no SPIRAM where you need some kind of real solution where hard interruption (that stop the GC) can't be used or in cases that microcontroller do not have that kind of interruption (like as ESP32 family) where the interruption will be delayed by GC.

In my opinion I strongly do not recommend to use gc.mem_free()/gc.mem_alloc()) as a way to do any calculation or/and check how much RAM still is free/allocated to do something, because that has very high cost time. It is very good for debug, and/or when are developing, but as production is not a good idea.

peterhinch commented 1 year ago

OK, I'm convinced. Now done.

peterhinch / micropython-micro-gui

latency control (do_gc = True) feature - thoughts #27