jtablesaw / tablesaw

Java dataframe and visualization library
https://jtablesaw.github.io/tablesaw/
Apache License 2.0
3.55k stars 643 forks source link

XLSX read only grabbing 2 decimal places #1245

Open alexp8 opened 11 months ago

alexp8 commented 11 months ago

I have an xlsx file where the cells are "formatted" to show 2 decimal places.

When I read in this file via XlsxReadOptions, only 2 decimal places are being parsed. e.g. 12.123 -> 12.12

Code:

XlsxReadOptions options = XslxRaedOptions.builder(filePath)
    .columnTypes(x -> ColumnType.STRING)
    .missingValueIndicator(indicators)
    .build();

Table table = Table.read().usingOptions(options);
ccleva commented 11 months ago

Hi @alexp8. This is the expected behavior when reading numerical values from an xlsx file in a column of type STRING: the reader applies the numerical cell format so what you get in the Table is closer to what you see in the original file.

Try using ColumnType FLOAT (or DOUBLE) for your numerical columns, or let the reader infer the column type by sampling.

If you have a mixed numerical/textual column in the xlsx file you'll have to disable the formatting in the source.

alexp8 commented 11 months ago

Thank you, do you have an example of the reader sampling to detect the data type?

ccleva commented 11 months ago

My bad, for xlsx files column type is not done by sampling, but the principle is the same.

If you don't configure a column type for some columns, the default behavior is to infer the column type, in this case by checking the column cell types in the file.