jdum / odfdo

python library for OpenDocument format (ODF)
Apache License 2.0
58 stars 11 forks source link

Method para.append_plain_text() trims whitespaces before text #48

Closed mlwesoly closed 2 months ago

mlwesoly commented 2 months ago

Hello,

i use

para = Paragraph("", style="Standard Linux") 
para.append_plain_text() 

for adding text like:

 1 some information  (one whitespace before the number)
 2 some more information  (one whitespace before the number)
 ...
10 in the middle of the block

Now the issue that i have with the result is, that the whitespaces before the rest of the text get trimmed. In previous versions of odfdo i added a workaround in paragraph_base.py. Now with the rewrite of it i wanted to ask before putting effort. Is there an easy way, so the whitespace from the beginning of the line does not get trimmed? And is this behavior as it should be? Thank you in advance for your help and time!

EDIT: Just found the _plain_text_splitted method, where i can add my workaround. Still I'm interested to know more about this behavior.

jdum commented 2 months ago

Hi, this is a beautiful bug ;-) What I see:

Note: It is not an error if the character preceding the element is not a white space character, but it is good practice to use this element only for the second and all following “ “ (U+0020, SPACE) characters in a sequence. """ However, the chapter juste before says something different (at 6.1.2 White Space Characters) : (5)Leading “ “ (U+0020, SPACE) characters at the start of the resulting text and trailing SPACE characters at the end of the resulting text are removed.

And LibreOffice shall implement that rule (5). As a result you dont see your space. And moreover, when inserting a leading space in LO, it is translated into a , thus complying with the 6.1.2 rules, but not exactly with the 6.1.3 definition of

So, I will try to implement that in the same way as LibreOffice (and it seems correct to me that leading/trailing spaces are a special case)

mlwesoly commented 2 months ago

Thank you very much for the comprehensive answer. I will also read a bit more into this, because what i read from your quotes makes me wonder, why did a committee decide that leading space characters should be removed. Trailing ones I can understand. Anyway that is for another day, i thank you for having a look into this topic.

my current workaround looks like this, also splitting the one space at beginning and then adding a Spacer in form of the block length. ....let's just say, its working for me at the moment. But i know it is just a not beautiful solution.

_re_splitter = re.compile(r"(\n|\t|^ |  +)")
_re_space = re.compile(r"^  +$")
_re_space2 = re.compile(r"^ +$")

... lower in _plain_text_splitted

    continue
            if _re_space2.match(bloc):
                # follow ODF standard : n spaces => one space + spacer(n-1)
                # self.append(" ")
                elements.append(Spacer(len(bloc)))
                continue
            if _re_space.match(bloc):
jdum commented 2 months ago

The new version 3.9 should fix the bug

mlwesoly commented 2 months ago

I tried it and until now it works perfectly. Thank you very much!