High memory usage when calling get_ch() in font code

peterhinch / micropython-font-to-py

A Python 3 utility to convert fonts to Python source capable of being frozen as bytecode

MIT License

368 stars 67 forks source link

High memory usage when calling get_ch() in font code #50

Closed NewWheelTech closed 8 months ago

NewWheelTech commented 8 months ago

I have found really high memory usage when only calling get_ch(). This is causing issues for me because it is causing the garbage collector to always run when displaying large amounts of text.

I have worked around this issue by adding a dictionary lookup in the get_ch() function as seen in the font_s1.py file.

Is there a reason for this memory usage and/or a better work around?

I have tested this code on Micropython 1.20 running on a RP2040.

font_org.py.txt font_s1.py.txt test_font.py.txt

peterhinch commented 8 months ago

I am puzzled by your findings.

The design of the Python font files was intended to be highly optimised for RAM use, especially in the case where font files are frozen as bytecode. In this case RAM usage (tested on Pyboards) was very low indeed. This works because get_ch() returns a slice of a memoryview into the font data. So the incremental RAM use of a call to get_ch() is the RAM used in creating a slice instance; this is measured in tens of bytes and will be reclaimed by GC after the glyph goes out of scope. This can be seen in the following example:

_mvfont = memoryview(_font)
_mvi = memoryview(_index)
ifb = lambda l : l[0] | (l[1] << 8)

def get_ch(ch):
    oc = ord(ch)
    ioff = 2 * (oc - 32 + 1) if oc >= 32 and oc <= 126 else 0
    doff = ifb(_mvi[ioff : ])
    width = ifb(_mvfont[doff : ])

    next_offs = doff + 2 + ((23 - 1)//8 + 1) * width
    return _mvfont[doff + 2:next_offs], 23, width

Where the font is not frozen there is significant RAM use when the file is imported, but the incremental RAM use on retrieving glyphs is minimal. Your approach a cacheing glyphs in a dict is likely to increase RAM usage as the dict grows with time.

Your test code will perform a lot of allocation:

def char_list(fontfile):
    char_list = list()
    myfont = __import__(fontfile)
    string = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
    for char in string:
        char_list.append(myfont.get_ch(char))
    return char_list

The char_list object ends up containing a copy of every glyph in the font. This would be the case even if get_ch() used no RAM at all.

The key to using font files efficiently is to copy the glyph either to a pre-allocated buffer (e.g. a Framebuf) or directly to display hardware.

NewWheelTech commented 8 months ago

Peter,

Adding the dict is a just a work around to help highlight the problem that the get_ch() code uses a lot of memory that needs to be clean up by the garbage collector.

The memory issues can still be seen when running this code.

def char_test(fontfile):    
    myfont = __import__(fontfile)

    string = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"

    for char in string:
        myfont.get_ch(char)

These are the results I am seeing. Just calling the get_ch code 62 times is adding 8k to the heap, or ~128bytes per call. This is just not acceptable if you are planning on displaying a large number of characters. It is also not realistic to garbage collect every so many times you call get_ch, something appears to be off. Maybe you are seeing different memory usage on your board.

Starting with:  220928
Running current font code
Try:    0 Memfree:  208384
Try:    1 Memfree:  200288
Try:    2 Memfree:  192192
Try:    3 Memfree:  184096
Try:    4 Memfree:  176000
Try:    5 Memfree:  167904
Try:    6 Memfree:  159808
Try:    7 Memfree:  151712
Try:    8 Memfree:  143616
Try:    9 Memfree:  135520
Try:    10 Memfree: 127392
Try:    11 Memfree: 119264
Try:    12 Memfree: 111136
Try:    13 Memfree: 103008
Try:    14 Memfree: 94880
Try:    15 Memfree: 86752
Try:    16 Memfree: 78624
Try:    17 Memfree: 70496
Try:    18 Memfree: 62368
Try:    19 Memfree: 54240
Starting with:  218096
Running new font code
Try:    0 Memfree:  204816
Try:    1 Memfree:  204656
Try:    2 Memfree:  204496
Try:    3 Memfree:  204336
Try:    4 Memfree:  204176
Try:    5 Memfree:  204016
Try:    6 Memfree:  203856
Try:    7 Memfree:  203696
Try:    8 Memfree:  203536
Try:    9 Memfree:  203376
Try:    10 Memfree: 203184
Try:    11 Memfree: 202992
Try:    12 Memfree: 202800
Try:    13 Memfree: 202608
Try:    14 Memfree: 202416
Try:    15 Memfree: 202224
Try:    16 Memfree: 202032
Try:    17 Memfree: 201840
Try:    18 Memfree: 201648
Try:    19 Memfree: 201456

NewWheelTech commented 8 months ago

After further investigation I as able to improve the memory usage by changing the get_ch code:

def get_ch(ch):

    oc = ord(ch)

    ioff = 2 * (oc - 32 + 1) if oc >= 32 and oc <= 126 else 0

    doff = _mvi[ioff] | (_mvi[ioff+1] << 8)
    width = _mvfont[doff] | (_mvfont[doff+1] << 8)

    next_offs = doff + 2 + ((12 - 1)//8 + 1) * width
    return (_mvfont[doff + 2:next_offs], 12, width)

Which gives these results:

Running new font code
Try:    0 Memfree:  209888
Try:    1 Memfree:  205760
Try:    2 Memfree:  201632
Try:    3 Memfree:  197504
Try:    4 Memfree:  193376
Try:    5 Memfree:  189248
Try:    6 Memfree:  185120
Try:    7 Memfree:  180992
Try:    8 Memfree:  176864
Try:    9 Memfree:  172736
Try:    10 Memfree: 168576
Try:    11 Memfree: 164416
Try:    12 Memfree: 160256
Try:    13 Memfree: 156096
Try:    14 Memfree: 151936
Try:    15 Memfree: 147776
Try:    16 Memfree: 143616
Try:    17 Memfree: 139456
Try:    18 Memfree: 135296
Try:    19 Memfree: 131136

That is still adding ~64 bytes per call to the heap.

peterhinch commented 8 months ago

To clarify the purpose of Python font files. The aim is to enable multiple font files to be accessed without those files having a static RAM footprint, enabling usage on hosts with minimal RAM. The get_ch() function is not allocation-free, even when the font file is frozen:

I am not an expert on MicroPython internals but my understanding of the code

glyph, height, width = get_ch(ch)

is that the creation of the slice into the memoryview allocates a few bytes. Returning three values creates a tuple. The tuple is then unpacked into three pre-allocated or stack-based variables. The memory allocated will be reclaimed by gc. I agree with your measurements: it is on the order of 60-250 bytes per glyph.

In practice the existing fonts have been used with various GUI frameworks to display substantial amounts of text. A GUI such as micro-gui inevitably allocates quite widely: you can create then discard entire screens full of widgets. The GUI performs periodic GC in a bid to minimise fragmentation. I am not aware of any practical adverse consequences of the allocation performed by get_ch().

If you can devise a way to write an allocation-free get_ch() or a Python font file format that avoids allocation I would be very interested to see it.

peterhinch commented 8 months ago

I have given this some thought. For the following reasons it is unlikely that I will accept a PR with allocation-free access as an aim.

For a non-allocating Python font file to be useful the rendering classes (Writer and CWriter) would also need substantial redesign. Further, as stated above, the goal of non-allocation is not achievable in the various GUIs that depend on the rendering classes. Another factor is that changes to the Python font file require great care for the frozen bytecode use case. Seemingly minor changes can force the MicroPython runtime to copy the bytecode to RAM, defeating the principal design goal.

However I can see that, in a different "eco system", allocation-free access and rendering is a worthwhile goal. If you publish tools to do this I will link to them in the docs. The approach I would use is this:

Use font-to-py to create Python font files as normal.
Write a utility that takes as input a Python font file and outputs a JSON encoded font file. This would comprise a dict whose key was the character (or perhaps ord(ch)) and whose value was a list comprising [glyph, height, width]. This utility would be very simple.
The application would load the dict and replace each glyph with a FrameBuffer containing the glyph. This obviously involves allocation but it is done once only.
The rendering code would use FrameBuffer.blit to render the glyph and to keep track of the vertical and horizontal position. I believe this could be done in a non-allocating way.

Note that I recommend JSON for your font file rather than emitting Python source. It would be valid to emit Python source but, if frozen, the runtime would copy the dict to RAM. Given that the dict has to live in RAM JSON is much simpler.

NewWheelTech commented 8 months ago

After further review of the issue I was having. I have now moved onto having code look like this.

def get_ch(ch):
    oc = ord(ch)
    ioff = 2 * (oc - 32 + 1) if oc >= 32 and oc <= 126 else 0

    doff = _index[ioff] | (_index[ioff+1] << 8)
    width = _font[doff] | (_font[doff+1] << 8)

    return uctypes.addressof(_font) + (doff + 2), 11, width

For some reason I was having issues with Memory Views, which might be related to Viper code generation. By removing the memory views completely and switch to pointers I was able to get much better results. If this is a option you would be interested into adding to font_to_py and/or find helpful I would make up a PR.

peterhinch commented 8 months ago

I spent a lot of time optimising the font file format for the case where the font file is frozen. Any changes run the risk of the MicroPython runtime copying the file to RAM and the memoryviews are a key part of that optimisation.

I am reluctant to change the font file format (which doesn't use native or Viper) unless there is a bug because much time consuming testing would be required.