miachm / SODS

A simple Java library for handle ODS (Open Document Spreadsheet, compatible with Excel and Libreoffice)
The Unlicense
74 stars 30 forks source link

Convert ods to csv - possible bug ! #16

Closed ebocher closed 4 years ago

ebocher commented 4 years ago

I try to read a sheet and convert it into a CSV file.

Code :

        String inputFile = "See attached file";
        SpreadSheet spread = new SpreadSheet(new File(inputFile));
        System.out.println("Number of sheets: " + spread.getNumSheets());

        List<Sheet> sheets = spread.getSheets();

        for (Sheet sheet : sheets) {
            System.out.println("In sheet " + sheet.getName());

            Range data = sheet.getDataRange();

            Object[][] values = data.getValues();

            for (Object[] row : values) {
                ArrayList forJoin = new ArrayList();
                for (Object value : row) {
                    if (value instanceof Double) {
                        value = value.toString();
                    }
                    forJoin.add(value);
                }
                System.out.println(String.join(",", forJoin));
            }
        }

It returns the following result :

Number of sheets: 1
In sheet fuller
id,lon,lat,nom_lieu,elephant_mer,baleine,cachalot,globicephal_noir,description
1.0,-72.063440,41.286780,New london,null,Départ le 15 juillet 1859.,null,null,null

instead of

Number of sheets: 1
In sheet fuller
id,lon,lat,nom_lieu,elephant_mer,baleine,cachalot,globicephal_noir,description
1.0,-72.063440,41.286780,New london,null,null,null,null,Départ le 15 juillet 1859.

There is an offset that seems to be related to the null values. Or maybe I'm not using the library properly.

Thanks.

example_ods.ods.zip

ebocher commented 4 years ago

Note that the offset appears when two null values follow each other

This ODS file is well parsed


id | cachalot | globicephal_noir | description |  
1 | non |   | Départ le 15 juillet 1859. |  

The next one no


id | cachalot | globicephal_noir | description |  
1 |  |   | Départ le 15 juillet 1859. |  
miachm commented 4 years ago

Thanks for your report!

It looks like a problem with repeated cells:

<table:table-cell table:number-columns-repeated="4"/>

ODS format uses the attribute "repeated" to save space in the file. The library takes that in consideration but deliberately ignores null cell values for optimization. Not a good idea.

It should be easy to fix, just taking an if away :)

ebocher commented 4 years ago

Thanks for your answer! I'm not familiar with your lib. Can you point me to the resource to modify?

miachm commented 4 years ago

Don't worry about it! I can commit and release a new version when I have the time.

Nevertheless, it's in the file OdsReader.java. If you search for the variable last_cell_value.

On Thu, 20 Feb 2020, 08:48 Bocher, notifications@github.com wrote:

Thanks for your answer! I'm not familiar with your lib. Can you point me to the resource to modify?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/miachm/SODS/issues/16?email_source=notifications&email_token=ACONZCATOPY6GPNBKLT6JJLRDYYWNA5CNFSM4KXVFYQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMLODEA#issuecomment-588702096, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACONZCG3LMBOUPQFZYORXC3RDYYWNANCNFSM4KXVFYQA .

ebocher commented 4 years ago

Try to play with last_cell_value variable but without success. So interested to get a fix as soon as you can. Best regards

miachm commented 4 years ago

Solved in the v1.2.2

ebocher commented 4 years ago

Excellent thanks a lot.