jferard / fastods

A very fast and lightweight (no dependency) library for creating ODS (Open Document Spreadsheet, mainly for Calc) files in Java. It's a Martin Schulz's SimpleODS fork
GNU General Public License v3.0
36 stars 6 forks source link

Special char µ is written as \uFFFD #179

Closed uwekoenig closed 4 years ago

uwekoenig commented 4 years ago

When I try to write the sign µ in a cell I get as result \uFFFD instead (opened with LibreOffice).

jferard commented 4 years ago

This is the unicode "REPLACEMENT CHARACTER" (https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character). Might be FastODS or not, I will check..

jferard commented 4 years ago

Here's my test code:

    final OdsFactory odsFactory = OdsFactory.create(Logger.getLogger("issue-179"), Locale.US);
    final AnonymousOdsFileWriter writer = odsFactory.createWriter();
    final OdsDocument document = writer.document();
    final Table table = document.addTable("issue-179");
    final TableCellWalker walker = table.getWalker();
    String s = "This is a µ";
    walker.setStringValue(s);
    walker.next();
    walker.setStringValue(Arrays.toString(s.getBytes("UTF-8")));
    String t = "And this is a μ";
    walker.nextRow();
    walker.setStringValue(t);
    walker.next();
    walker.setStringValue(Arrays.toString(t.getBytes("UTF-8")));
    writer.saveAs(new File("generated_files", "issue-179.ods"));

And here's the output:

issue-179

As you see, first one is \xc2\xb5 (MICRO SIGN) and second one is \xce\xbc (GREEK SMALL LETTER MU), same symbol but semantic is different.

My guess is that your source file is not encoded in utf-8 and when the conversion is made, μ in your encoding is converted to bytes and written to the file. When LibreOffice opens the file, the REPLACEMENT CHARACTER is used to handle a non utf-8 sequence. Illustration in Python:

>>> 'µ'.encode("cp1252")
b'\xb5'
>>> _.decode("utf-8", "replace") # would fail without replace
'�'
>>> ascii(_)
"'\\ufffd'"

Please check your source file encoding.

uwekoenig commented 4 years ago

You are right. The problem came from another library. Sorry for that and thank you very much for the detailed explanation and you time.