jdum / odfdo

python library for OpenDocument format (ODF)
Apache License 2.0
48 stars 11 forks source link

Adding pretty=True causes extra spaces in output #28

Open keestux opened 10 months ago

keestux commented 10 months ago

Let's assume a simple sequence, read an ODT, write it to some other file.

    mydoc = Document(inp_fname)

    content = mydoc.get_part('content')

    mydoc.save(target=output_fname, pretty=True)

The call to get_part shouldn't be doing much, but it is essential for the bug to show up. Probably reading anything from mydoc will trigger it.

If the input ODT contains simple text paragraphs. However, some of the text is edited, delete a letter in a word, or add a letter in a word. The result is a Paragraph with multiple spans. Something like

      <text:p text:style-name="P7">
        <text:span text:style-name="T12">This is an example with =&gt;</text:span>
        <text:span text:style-name="T5"> v</text:span>
        <text:span text:style-name="T8">8</text:span>
        <text:span text:style-name="T5">.1.</text:span>
        <text:span text:style-name="T8">4</text:span>
        <text:span text:style-name="T5"> &lt;</text:span>
        <text:span text:style-name="T12">= spaces </text:span>
        <text:span text:style-name="T13">after reading and writing with odfdo.</text:span>
      </text:p>

The input document shows:

This is an example with => v8.1.4 <= spaces after reading and writing with odfdo.

The output document shows:

This is an example with => v 8 .1. 4 < = spaces after reading and writing with odfdo.

keestux commented 10 months ago

The example program

#!/usr/bin/env python3

from odfdo import Document, Paragraph

mydoc = Document('BugOdfdo.odt')

content = mydoc.get_part('content')

mydoc.save(target='BugOdfdo2.odt', pretty=True)
keestux commented 10 months ago

Here is the example input ODT BugOdfdo.odt

keestux commented 10 months ago

Maybe it is a bug in LibreOffice. When I look at content.xml there is nothing different, except for the white space (pretty print).

jdum commented 10 months ago

Hi, thanks for this interesting bug. Actually I'm not sure to remember of what should be the correct interpretation of the standard. But a few first thoughts: