Closed Nadavbi87 closed 3 years ago
Hi, @Nadavbi87, Sorry you're having trouble with the library.
Is it possible to see if you can recreate this by first exporting the Excel file to CSV and see if it has the same problem on import?
Hi all, I stumbled on this issue a couple of days ago and took the opportunity to dig a bit further:
I could reproduce the problem always by using one of the empty string indicators (TypeUtils.MISSING_INDICATORS) as the first value in the column (after the header). In this case the column is identified as a StringColumn and the following numeric values are ignored (the column is filled with empty values).
The problem does not occur with csv as the column type identification is done in a different way than with the excel reader (which rely on the cellType).
I think there are 2 different issues there:
I can provide a couple of (failing) tests and try to implement a solution to the 1st issue if you want
We have a ColumnTypeDetector
that is used by CsvReader
, FixedWidthReader
, and HtmlReader
and handles both issues that you mention. I think that perhaps XlsxReader
was written before ColumnTypeDetector
existed.
The difficulty of using ColumnTypeDetector
is that it takes Iterator<String[]> rows
and in XlsxReader
we have a org.apache.poi.ss.usermodel.Row
instead of String[]
for each row. I wonder if we could make ColumnTypeDetector
a little more generic and subclass it for XlsxReader
. There's a lot of logic in ColumnTypeDetector
that'd be nice to share so that things work consistently across all the readers (e.g. when to use a StringColumn
vs TextColumn
, sampling, handling of mixed-content columns, etc.)
@benmccann Current type detection isn't based on the values, it's based on cell type on excel. I think that this is not bad, but would be needed to check all the values and select the column type that can accommodate all. In that case, I think that using ColumnTypeDetector isn't needed, isn't it?
Ah, I wasn't aware Excel provided a cell type
Hi, First of all, thanks for this awesome package.
I'm using the XlsxReader to import excel file, in one of my tests I came across the following issue: When the reader auto-detect the column types and the first column value is empty, it automatically detects it as a String Column and creates a String typed column, till here everything is fine. When any other value in that column is a Numeric type(in my case all of them) it gets ignored and instead, it returns an empty value. So, as a result, I get a table with a column that all of the values are empty. I would except that I get all the values at least as a string and not empty value.
The logic is in the
XlsxReader.private Column<?> appendValue(Column<?> column, Cell cell)
.I try to solve it by using the "specifying the datatypes for each column" approach I did the following :
Then I realize that there is no use in the columnTypedToDetect in the XlsxReader.
*Another suggestion, it will be good if I'm using multiple sheets and using
public List<Table> readMultiple(XlsxReadOptions options) throws IOException
That I can pass a list of options that each one of them is bounded to a specific sheet, Because in the current situation if I want for example different name ( or different column type list) for each sheet/table I need to perform a multiple reads ( as the number of the sheets ) and set the options for each one even though under the hood for each call it goes over all the sheets and return the sheet that correspond the index the user pass in the options.I 'm using the latest version 0.37.3.
Thanks, Nadav