miachm / SODS

A simple Java library for handle ODS (Open Document Spreadsheet, compatible with Excel and Libreoffice)
The Unlicense
74 stars 30 forks source link

Support writing strings with line breaks #65

Closed LoadingByte closed 1 year ago

LoadingByte commented 1 year ago

Consider this example of writing a spreadsheet that contains a string with a line break:

import com.github.miachm.sods.*;
import java.io.*;

public class Main {
    public static void main(String[] args) throws IOException {
        Sheet sheet = new Sheet("Test", 1, 1);
        sheet.getDataRange().setValue("a\nb");
        SpreadSheet spread = new SpreadSheet();
        spread.appendSheet(sheet);
        spread.save(new File("out.ods"));
    }
}

When opening out.ods, we see that LibreOffice actually renders a space instead of the newline:

How LibreOffice displays out.ods

Looking into the file, this XML defines the cell's value:

<table:table-cell office:value-type="string" office:string-value="a
b"><text:p>a</text:p><text:p>b</text:p></table:table-cell>

I've observed that if the office:string-value attribute is removed, LibreOffice correctly renders the newline:

How LibreOffice displays the modified out.ods

So the issue seems to be that LibreOffice replaces newlines in the office:string-value attribute with spaces. Sadly, this behavior is not documented in the spec (http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#__RefHeading__1417668_253892949).

To find a way to encode newlines s.t. they are preserved, I manually typed the formula

="a
b"

into the spreadsheet and let LibreOffice save the file. This is what it wrote:

<table:table-cell table:formula="of:=&quot;a&#x0a;b&quot;" office:value-type="string" office:string-value="a&#x0a;b" calcext:value-type="string">

On first sight, it appears as if encoding \n as the entity &#xA; in the office:string-value attribute solves the issue. However, when removing the table:formula attribute, the newline disappears again. So it seems like the formula is what actually enforces the newline here.

So all in all, I think the only solution to this problem is to drop the office:string-value attribute. Seeing as LibreOffice doesn't even write it itself for plain non-formulaic content, does it really need to be written by SODS?

LoadingByte commented 1 year ago

Just noticed that #63 also tackles this by writing a blank office:string-value attribute if and only if there's a newline in the string, but that kind of special case seems brittle. I'd be very grateful if an isolated fix for the newline bug (e.g., not writing office:string-value when the cell doesn't contain a formula) could be make it into the next version :)

miachm commented 1 year ago

Good spotted!

Unless I am missing an edge case, seems you're right. String-value should not be written for Strings values. I think it's not required even for formulas, but we need to test that.