Closed rysson closed 1 year ago
Some hot-fix for both methods.
Without any extra arguments (to keep backward compatybility), I've found your awesome library today, I'm not able to make it in the correct way.
File line_break.py
HARD_SPACE = "\u00a0"
class Fragment:
...
def add_character(...):
...
if character == SPACE:
...
elif character == HARD_SPACE:
character = SPACE
self.fragments.append(Fragment("", graphics_state, k, url))
# changing active fragment is not necessary I guess,
# space can be at end of the last fragment or first in the new fragment as well
# active_fragment = self.fragments[-1] # TODO: delete me
self.number_of_spaces += 1
elif character == SOFT_HYPHEN and not self.print_sh:
...
@gmischler : as you have previously worked on this subject, could you have a look this please? 😊
Thanks.
I have some commit to test, see https://github.com/rysson/fpdf2/commit/db0625a6963faff9376ad29f11b54eabddb83a20
I added nbsp
argument (default False) to FPDF
class.
For string like 1␣2 aaaaa␣3 x␣aaaaa
...
with nbsp=False
(default):
and with nbsp=True
:
I wanted to create PR, but git hooks failed (see #835).
@rysson, it took me quite a while to figure out that you're actually talking about whitespace getting stretched when justifying text. I can conclude that from a single line of code, but explicitly explaining that at the start might have been helpful...
This appears to be a "feature" of how PDF viewers are applying the ## Tw
declaration in the file only to normal space characters, and not other whitespace with builtin fonts. I haven't checked what the specification says about it, but it may well instruct them to do so.
If we didn't rely on Tw
for builtin fonts, we could simply include the NBSP in TextLine.number_of_spaces
and in the split pattern for word separation. But as it is, replacing it with a normal space looks like the simplest way to go.
However, note that the relevant method is CurrentLine.add_character()
, and not in Fragment()
. Also, you must not touch self.fragments
at that point. You're only substituting a character, not changing anything about the graphics context.
NBSP = "\u00a0"
class CurrentLine:
...
def add_character(...):
...
if character == SPACE:
...
elif character == NBSP:
# PDF viewers ignore NBSP for word spacing with "Tw".
character = SPACE
self.number_of_spaces += 1
elif character == SOFT_HYPHEN and not self.print_sh:
...
Thank you @gmischler for investigating this
@rysson, it took me quite a while to figure out that you're actually talking about whitespace getting stretched when justifying text [...]
It's why I told about fixed-width spaces. Sure, I should write it's important in justify, sorry.
This appears to be a "feature" of how PDF [...]
I'm totally newbie in PDF, I'm sorry.
If we didn't rely on
Tw
for builtin fonts, we could simply include the NBSP inTextLine.number_of_spaces
and in the split pattern for word separation. But as it is, replacing it with a normal space looks like the simplest way to go. However, note that the relevant method isCurrentLine.add_character()
, and not inFragment()
. Also, you must not touchself.fragments
at that point. You're only substituting a character, not changing anything about the graphics context.
Great! Simpler and working :-) I found the awesome project yesterday. I was just guessing.
So, what next?
So, what next?
just update your PR accordingly.
With regards to that, I recommend to stick with standard terminology (eg. NON_BREAK_SPACE or NBSP instead of HARD_SPACE). I also think we don't need an option to activate/deactivate this. Other than not triggering line breaks, a NBSP should indeed always behave exactly as a normal space. I can't think of a valid use case for disabling that. There are seperate fixed-width space-type characters available for when a different behaviour is desired. Without the option handling your PR will become a lot simpler, without cluttering the API. You should just add some test cases to verify that it really does what it is supposed to do and won't break with any other future modifications.
I wanted to create PR, but git hooks failed (see https://github.com/PyFPDF/fpdf2/issues/835).
FYI @rysson, this should now be fixed 😊 And thank you for this PR!
Thanks, than I create new PR based on @gmischler notes.
Error details
Incorrect support for non-breaking space (NBSP, hard-space, 0x00a0).
Non-breaking space prevents line breaking and it works fine.
Non-breaking space is not fixed-width at all. I know some old systems still use fixed-width NBSP but all modern browsers not.
NBSP (as Unicode character or
entity) should be variable width space (SpaceHint
?) in theFPDF.write_html()
.The same in the
pdf.multi_cell()
. It could be a compatibility problem if users use it as fixed-width space. Maybe extra option (fixed_width_nbsp=True
)?Minimal code
Environment Please provide the following information:
fpdf2
version used: 2.7.4 (pip)Also check on GitHub
master
branch.