Open reticivis-net opened 2 years ago
also options to align text left, right, center, or justify would be useful as well. This script implements that
and top/middle/bottom text alignment. another example script
Sounds like a reasonable enhancement request. Note that there might be issues with mixed LTR/RTL text which will need extra tests.
A few notes:
with the library's
getsize
func
Please use the getlength
function instead, the API of the getsize
function is fundamentaly broken (especially with non-English text) and shouldn't be used for text layout purposes.
I think there should be an option to only break at word boundaries unless the word exceeds the max width, like the CSS word-break property
Word breaking is quite a difficult task, I'd suggest to constrain this to spaces to start with.
when reaching the max height, the font size is gradually reduced until the text fits inside
Not possible within the current API, a font is created at a given size and cannot be easily changed.
also options to align text left, right, center
Already possible using the align
parameter, only justify
is not yet supported. It would also require extra work in the new proposed function, so I see that as a separate request.
and top/middle/bottom text alignment
Already possible, use the anchor
parameter with multiline text.
Please use the getlength function instead, the API of the getsize function is fundamentaly broken (especially with non-English text) and shouldn't be used for text layout purposes.
oh, good to know. probably should add that to the docs?
Word breaking is quite a difficult task, it might be better to constrain this to spaces to start with.
right yeah I forgot CJK and other languages don't have definite characters at word boundaries, I just meant to break at whitespace which as I understand it shouldn't be too hard and might actually be faster than individual character breaking. i'd add support for zero-width spaces to allow some external library or native speaker to mark word boundaries for pillow assuming its a non-trivial task
Not possible within the current API, a font is created at a given size and cannot be easily changed.
ah so that's why the script I linked loads from file every change. is there an existing way to regenerate fonts without reloading from file or would that need to be an entire API change?
thanks for the quick and detailed response!
oh, good to know. probably should add that to the docs?
I think it will be deprecated soon, it's just a matter of working out the replacement (font.getsize_multiline
doesn't have a clear replacement, that might be made easier by cleaning up the parameters as suggested in https://github.com/python-pillow/Pillow/pull/6195#discussion_r847410876). Discussed in #5816.
i'd add support for zero-width spaces
That part was just a suggestion to avoid overcomplicating things. Sure, zero-width spaces can probably be supported.
is there an existing way to regenerate fonts without reloading from file or would that need to be an entire API change?
I think you can load a font file in Python and then pass the bytes as input to ImageFont.truetype
.
I see. Thank you!
I've made a first attempt at implementing this, using a greedy algorithm:
https://github.com/atomicparade/pil_autowrap/blob/main/pil_autowrap/pil_autowrap.py#L73-L220
Example output here.
Issues:
FreeTypeFont.getbbox
would be more appropriate than FreeTypeFont.getlength
?Current blind spots and possible improvements:
I don't know how appropriate the results are for Arabic and Hebrew. Chinese, Japanese, and Korean text is not broken up properly.
If you are referring to this:
Certain characters in those languages should not come at the end of a line, certain characters should not come at the start of a line, and some characters should never be split up across two lines. For example, periods and closing parentheses are not allowed to start a line
then I would not worry about it. Similar rules exist in some European languages and even MS Word doesn't really help there.
I made the assumption that the line height is equal to the font size; however, looking at some of the generated images for Arabic and Hebrew, this doesn't appear to be the case. Maybe FreeTypeFont.getbbox would be more appropriate than FreeTypeFont.getlength?
The text height is calculated here:
https://github.com/python-pillow/Pillow/blob/134023796e935ef79d5feb6879e9270327cfb8a2/src/PIL/ImageDraw.py#L514-L516
where spacing
is a parameter defaulting to 4. This is not really accurate for some fonts, but it is used for historical reasons.
Do not use getbbox
. That returns the height of the rendered text (which could be different for each line) and width of the rendered text (again, can be different with e.g. slanted text). It is not appropriate for text layout. Fonts generally don't exceed the line height and layout width they report, or only do so by a small amount when appropriate for stylistic reasons. (The height calculated above is not the actual line height reported by the font, but should be close enough in most cases).
I feel that getting it working with “easier” languages first (ones that use white space or other characters to break words) would be the best thing to do right now as CJK word-breaking seems like a non-trivial task that could be hacked in by adding zero-width spaces. Is there an existing library that can determine word boundaries that could be included by PIL as an extra?
I may have misunderstood the Wikipedia article. The Unicode Line Breaking Algorithm is more helpful.
I think that it is probably sufficient to implement the non-tailorable part of the algorithm (see start of Table 1), which is just that line break characters are a mandatory break and spaces/zero-width spaces are an optional break. According to LineData.txt, this means it is sufficient to consider replacing the SPACE (U+20) and ZERO-WIDTH SPACE (U+200B) characters with "\n"
. The rest of the Unicode Line Breaking Algorithm would probably be best left to another library (e.g. by inserting zero-width spaces).
Is there an existing library that can determine word boundaries that could be included by PIL as an extra?
After a brief search, I couldn't find one that is freely available.
The Unicode Line Breaking Algorithm is more helpful.
I'm going to give this a shot! I'll start with Table 1 and leave all of the other character classes as break-allowed for now, though I think I'd like to try to implement the others as well.
If you're going to implement the entire Unicode Line Breaking Algorithm, I recommend making it its own library If it's really complex or requires a table of characters or something, it could be specified as a PIL extra to not bloat PIL
If you want to implement the full algorithm, it might make sense to add it to Raqm (which Pillow uses internally), or make it a separate library that Raqm can use. See https://github.com/HOST-Oman/libraqm/issues/50
requires a table of characters or something
The LineData.txt
from Unicode I linked above is the official list Unicode character line-breaking classes.
I wasn’t familiar enough with PIL’s internals to suggest that but that is a good idea
Is there an existing library that can determine word boundaries that could be included by PIL as an extra?
After a brief search, I couldn't find one that is freely available.
The Raqm issue mentions https://github.com/adah1972/libunibreak. I haven't looked at it too closely, but it seems to be a C library implementing the Unicode algorithm that returns a list of valid break positions.
I am exploring adding a font_wraptext
function (to start; would probably be nice to have a function to automatically determine an appropriate font [size] as well) to src/_imagingft.c
and adding unibreak
as a feature that depends on libunibreak
being installed.
It does look like (Never mind! It doesn’t.)libunibreak
maintains internal state (linebreak.c
-> set_linebreaks_utf8
), so I am not sure whether this will work well with multithreading.
Edit: Somehow I completely missed the part about adding this as a feature to libraqm
itself. Hmm...
Any news on this? Would love this feature.
Looks like a stalled attempt at greatness, maybe someone can pick up the effort: https://github.com/atomicparade/pil_autowrap
Bump for interest
many people have written scripts to do this and it's relatively easy with the library's
getsize
func but I feel like it really should be a built-in feature.i think there should be a
draw_text_box
or similar function which has these properties on top of the existingdraw_multiline_text
:I'm not very experienced with the library internals or C in general so for now I won't make a pull, I just want to throw the idea out there to the devs