Fast access of bitmap buffer with numpy

jiong3 commented 7 years ago

Hi,

currently the bitmap buffer can be accessed using freetype.Bitmap.buffer which returns a python list of all the bytes. Then I can use np.fromiter to get a numpy array, however, due to the python loop through all the bytes, this is really slow.

Is there a way to access the memory that the buffer points to directly with numpy? Anything I have to consider if I try to do that?

rougier commented 7 years ago

Good point. I think numpy.frombuffer might be useful in such a case but I've never really experienced it. However I think this might be a good starting point.

jiong3 commented 7 years ago

So I had a look around on the internet and found different ways to do that:

@staticmethod
def get_np_array0(bitmap, num_bytes):
    # 34.625 / 36.874
    return np.fromiter(bitmap.buffer, dtype=np.uint8)

@staticmethod
def get_np_array1(bitmap, num_bytes):
    # 19.933 / 21.485
    return np.fromiter(bitmap._FT_Bitmap.buffer, dtype=np.uint8, count=num_bytes)

@staticmethod
def get_np_array2(bitmap, num_bytes):
    # 0.037 / 1.158, int_asbuffer is not documented
    return np.core.multiarray.int_asbuffer(ctypes.addressof(bitmap._FT_Bitmap.buffer.contents), num_bytes)

@staticmethod
def get_np_array3(bitmap, num_bytes):
    # 0.418 / 1.540, potential memory leak according to github issue 6511
    return np.ctypeslib.as_array(bitmap._FT_Bitmap.buffer, (num_bytes,))

@staticmethod
def get_np_array4(bitmap, num_bytes):
    # 0.072 / 1.242
    bfm = ctypes.pythonapi.PyBuffer_FromMemory
    bfm.restype = ctypes.py_object
    buffer = bfm(bitmap._FT_Bitmap.buffer, num_bytes)
    return np.frombuffer(buffer, dtype=np.uint8)

@staticmethod
def get_np_array5(bitmap, num_bytes):
    # 0.079 / 1.145
    buffer = ctypes.cast(bitmap._FT_Bitmap.buffer, ctypes.POINTER(ctypes.c_ubyte * num_bytes))
    return np.frombuffer(buffer.contents, dtype=np.uint8)

The numbers in the comments are from cProfile (cumtime of get_np_arrayX) / (cumtime of main function), just to get an idea of the performance. I rendered 10000 characters.

Two things I am not sure about and that might be relevant: When is the memory of the buffer freed? When is bitmap.pitch different from bitmap.width, and when is it negative?

rougier commented 7 years ago

Nice ! But your last question reminds that we may have a problem with width/pitch difference.

The explanation can be found here: https://www.freetype.org/freetype2/docs/reference/ft2-basic_types.html#FT_Bitmap

I'm not quite sure I understand it correctly.

jiong3 commented 7 years ago

Here's another explanation of the pitch: https://www.freetype.org/freetype2/docs/glyphs/glyphs-7.html

The way I understand it is that for just reading the buffer into a numpy array, num_bytes = rows * abs(pitch) should work correctly in all cases. If the pitch is negative the order of the rows has to be reversed (easy to do in numpy). Since the pitch is the number of bytes per row and width the number of pixels, for a normal grayscale (1 pixel = 1 byte) both are the same however if it's a black and white image (1 pixel = 1 bit) you have to unpack the pixels. That should be equally easy on a numpy array.

I think it would make sense to include something that can be used directly with np.frombuffer into the library, maybe method number 4 or 5.

The remaining question is, should the user immediatly create a copy of the array? Since I am not sure how and when the memory of the buffer will be freed.

rougier commented 7 years ago

We can also directly return a copy (just in case). I think freetype can free the glyph anytime so it might be safer to return a copy.

rougier commented 7 years ago

@StephewZ Can you open a new issue for this problem ?

HinTak commented 7 years ago

Sigh. You guys don't understand what 'pitch' is. It is not the same as width, nor number of pixels in gray. It is a memory offset. It is the same concept as what is called 'stride' in numpy lingo. (see https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.ctypes.html , or whatever else is available on numpy).

The idea is that computers are a lot more efficient when dealing with say, 4-bytes or 8-byte chunks. So when you want to faster-forward or backward in memory, you want to do so in such units, instead of bytes. For bits, it is obvious that pitch is AT LEAST (the number of bits rounded up to multiple of 8)/8, since you can't fast-forward by half a byte. But for grays, you might have stride being width rounded up to multiple of 4, or 8, depending on whether you are on a 32-bit or a 64-bit platform.

Pitch is the distance between the two memory locations of the beginning of row1 and row2, etc. It is always larger than (bit-depth * pixel width) /8 , because memory locations like to be aligned to multiple of 4 or 8, depends on platform. i.e. if you have 17 pixels of gray per row, it is possible that stride can be 20 or 24.

It is called pitch by some, but called stride in numpy's multi-dimensional array type's documentation.

jiong3 commented 7 years ago

Sigh. You guys don't understand what 'pitch' is.

?

As I wrote above, the pitch is the number of bytes per row. According to the documentation, "FreeType functions normally align to the smallest possible integer value". So for grayscale bitmaps width and pitch are likely equal, unless the alignment is changed. In the common case of accessing the buffer as a whole an alignment of the rows to 2 or 4 bytes wouldn't be faster anyway.

HinTak commented 7 years ago

No, pitch is not number of byte per row. It is the distance between two rows in bytes. Can you not read?

In cairo lingo, it is also called stride. Cairo even have a special function for converting/calculating stride from width. This tells you stride is not the same as width.

I am concerned that you are proposing fast but wrong code. Code that is wrong, is wrong, whatever the speed.

HinTak commented 7 years ago

You also do not seem to be able to read documentation - "normally" means "most of the time" . It is meaningless to quote that sentence in this context.

jiong3 commented 7 years ago

It says so in the documentation:

The pitch's absolute value is the number of bytes taken by one bitmap row [...]

I never suggested not to test for pitch != width, but since they are equal in the most common case this is what should be optimized for.

It is always larger than (bit-depth * pixel width) /8, [...]

That's wrong.

jiong3 commented 7 years ago

In general, how should the buffer be handed to the user? As a raw buffer, numpy (dependency) or python array, with or without padding, bits unpacked to bytes?

rougier commented 7 years ago

Goign back to the numpy handling, I think it would good to return a copy by default. We could provide an option to not make a copy, but we don't have real control on when the buffer will be freed.

HinTak commented 7 years ago

I think numpy itself is the problem. images are not numbers. The problem is that you insist on thinking of images as array of numbers. Performance could be much better moving to a toolkit which explicitly cater for in terms of imagning and drawing concepts, such as cairo. (and various python binding of cairo).

The composite code in the worldle example would be a lot simpler and also a hell lot faster if re-written as cairo image surface compositing. You let cairo handle the semi-tranparancy, instead of python looping by hand over the pixels as numbers, numpy style.

rougier commented 7 years ago

The reason to use of numpy in the wordle example was mostly to have an easy way to test for collision. It does not pretend at anything else. I agree cairo (or the antigrain library) would be a better solution for manipulating/compositing images and drawing but that's a separate problem. Examples are really and only illustrations on how to use the library.

HinTak commented 7 years ago

"Examples are really and only illustrations on how to use the library." - well, that's what I think about comments on speed and memory usage of the examples. If you want speed (or memory efficiency), you write your code entirely differently.... and you are not even using any of the vector maths operations offered by numpy , which is another problem with using numpy - you are not using numpy properly for its main strength.

All the examples of http://github.com/ldo/python_freetype (in http://github.com/ldo/python_freetype_examples ) uses cairo. And they are a lot faster than any of the ones here too! A pity (1) they use another new custom cairo python binding instead of pycairo (very much "not invented here" symptom) , (2) it is python 3 only, (3) the coding style is terrible - besides the one-big-file-as-source-code code organization.

I am tempted to extract the freetype bitmap to cairo surface code from that as a stand-alone routine.

The comment about gray being the most common also seems out of place. The most common imaging case is really 24-bit colour; follow by bitmap (i.e. black/whilte). 8-bit gray is really the least common usage of freetype.

rougier commented 7 years ago

A stand-alone cairo example would be a nice addition.

HinTak commented 7 years ago

So much for trying to extract the cairo surface code from the other freetype binding - it is simply wrong : https://github.com/ldo/python_freetype/issues/1 https://github.com/ldo/python_freetype_examples/issues/1

That said, my corrected version is a hell lot faster than the numpy versions... Yes, I am already timing my standalone cairo example. I think numpy is just slow.

HinTak commented 7 years ago

I have rewriiten 6 of the samples with pycairo. glyph-{monochrome,alpha,color}, hello-world, example1, and wordle . The last one is the most difficult one - I needed to use a feature newly added to pycairo 1.11 (released two weeks ago), and cannot pack as tightly as the original. OTOH, cairo can paint partly off-buffer, so you can see the difference.

And it is a hell lot faster too...

HinTak commented 7 years ago

wordle-cairo

The cairo based wordle drawing. I cannot pack as tight, but can draw partly off screen.

HinTak commented 7 years ago

glyph-cairo-gray

cairo-based glyph-alpha

HinTak commented 7 years ago

glyph-cairo-mono

cairo-based glyph mono-chrome

HinTak commented 7 years ago

glyph-color-cairo

glyph-color

HinTak commented 7 years ago

example_1-cairo

The boring example1, no visual difference other than it being a lot faster.

HinTak commented 7 years ago

hello-world-cairo

The hello world example.

HinTak commented 7 years ago

Since they are proper drawings rather than plots, there are no axes or padding around the figure, nor any grid lines.

glyph-outline.py is essentially half of glyph-color so I'm not going to do it; glyph-vector-2.py have grid lines. I can't really do glyph-lcd . So the above covers all the numpy-based plot example. (there the gl example also uses numpy but I'll let you figure that out...).

When I get the samples cleaned up, and adding some comments on limitations, etc, I'll issue a pull.

rougier commented 7 years ago

@HinTak Thanks, nice results. For the PR, it would make sense to add all of them with the "-cairo.py" extension and to keep the old ones (or to have a dedicated cairo subdir) because it requires an extra dependency. For the wordle example, I think the difference come from the collision test. Probably cairo uses bouding boxes and this prevent one text to be drawn over another one even if the glyphs do not collide.

@jiong3 Do you think you're ready to make a PR from your tests and out discussion ?

HinTak commented 7 years ago

Yes, that's what I have been doing - 5 *-cairo.py, and an extra bitmap_to_surface.py which consists of extracted, afjusted and bug-fixed routines from the other freetype binding. There is at the moment no separate glyph-monochrome vs glyph-alpha - they differ only by one-line (TARGET_MONO/TARGET_NORMAL) so I just comment/uncomment the alternatives at the moment.

HinTak commented 7 years ago

Also I found some of the numpy examples doing y-direction flips - worldle does it at least twice :-(. And also the arrays having width and height in fortran indexing style... haven't seen them in a while...

rougier commented 7 years ago

Y-flip in an error, matplotlib can take care of that actually. For numpy array they are C-order but indexing if row (=y) / column (=x).

HinTak commented 7 years ago

Viewing vs the saved images differing is a bit painful. The original wordle draws things up-side-down and display it up-side-down, then save it the correct way up. That numpy/matplot can cope isn't quite the point. Anyway, the cairo based one all have things drawn the same way up it is saved. Actually I don't display with any of cairo's display backend, but just save to file then launch python pillows's image displayer.

HinTak commented 7 years ago

glyph-outline-cairo

I have decided to add a cairo version of glyph-outline anyway, quite trivial since it is just half of glyph-color.

The pull is at https://github.com/rougier/freetype-py/pull/55

HinTak commented 7 years ago

BTW, the outline example has an transparent background, whereas I paint most of the other's background grey first. PIL displays transparent as black; I have another viewer displays it as white. Gimp shows a checkerboard pattern for transparent pixels.

I have also changed my mind about editing to change between mono or alpha modes of the combined mono+alpha example. It defaults to alpha but if you put any argument to it, it draws mono. Explained in the comments at its top.

jiong3 commented 7 years ago

@rougier No, but if anyone wants to make a PR I would suggest option 4 or 5, or maybe something using a python array (which I haven't tested so far).

HinTak commented 7 years ago

wordle-cairo-collison

Here is an example when I got the stride/pitch wrong - noticed how some of the tiles collides? (only a few). The corrected code/figures are https://github.com/rougier/freetype-py/pull/55#issuecomment-297804370 https://github.com/rougier/freetype-py/pull/55#issuecomment-297804687 (two, depends on whether one's pycairo is latest). Drawing partly over the edge requires latest pycairo.

HinTak commented 7 years ago

I thought I couldn't do the LCD example in cairo - but it get better as I get more familiar. So I have added the LCD_V case side-by-side too: https://github.com/rougier/freetype-py/pull/55#issuecomment-298181243

The cairo LCD example is about 4 times after than the old; with two panels, it probably means 8x .

As I get more familiar with pycairo, I feel like I could probably rewrite glyph-vector-2 also. It is a vector drawing on top of a bitmap. After that, there is only one file which uses the slow data.extend(bitmap.buffer... idiom: texture_font.py, which is used by the gl example.

HinTak commented 7 years ago

To answer an early question: I think you can get negtive pitch if you use a reflecting transform. i.e. if you do a FT_Set_Transform with a matrix which has a negative determinant. I haven't tesed this, but e.g. if you set up example_1 to use matrix = (-1 0, 0, 1) or (1, 0, 0, -1) instead.

Only two examples do FT_Set_Transform at the moment. So, example_1 and wordle would break if they ever get extended to use FT_Set_Transform that way.

HinTak commented 7 years ago

I am done with converting/rewritng all the examples from the slow numpy/matplot drawing over to cairo: https://github.com/rougier/freetype-py/pull/55

A side-effect is none of my code uses the stupid data.extend(... idiom; the *-cairo.py versions are all a lot faster. There is only one data.extend(... left, in texture_font which is used by subpixel-positioning, which uses opengl for drawing so I do not touch.

So I am going to look at the perl-binding of freetype now. It should be obvious by now that I know freetype well and just looking to use it with a different language than C.

rougier / freetype-py

Fast access of bitmap buffer with numpy #45