mbleron / ExcelTable

ExcelTable is an Oracle SQL interface to read Microsoft Excel files (.xlsx, .xlsm, .xlsb, .xls, .xml), ODF spreadsheet files (.ods) and flat files as external tables
MIT License
45 stars 12 forks source link

getRows with DOM_READ fills empty cells with last non-empty value #26

Closed GHO65 closed 3 years ago

GHO65 commented 3 years ago

When using getRows with method DOM_READ to load data from an Excel file, empty cells are filled with the value from the last non-emtpy cell. This continues until a new non-empty cell is encountered. When using STREAM_READ instead of DOM_READ, empty cells remain empty.

SELECT ROWNUM xlsx_rownum , t.* FROM TABLE(ExcelTable.getRows(ExcelTable.getFile('LOAD_DIR', 'ImportData.xlsx') , 'TableX' , ' "Source" varchar2(1000 BYTE) column ''A'' , "Name" varchar2(40 BYTE) column ''B'' , "Unit" varchar2(40 BYTE) column ''C'' , "Amount" varchar2(40 BYTE) column ''D'' ' , 'A1:D27' --, 1 -- Adding 1 for method STREAM_READ fixes handling of empty cells ) ) t ;

mbleron commented 3 years ago

Could you share a sample file please? Thanks.

GHO65 commented 3 years ago

Example Excel : testEmptyCells.xlsx

SQL: SELECT ROWNUM xlsx_rownum , t.* FROM TABLE(ExcelTable.getRows(ExcelTable.getFile('LOAD_DIR', 'testEmptyCells.xlsx') , 'Sheet1' , ' "BRON" varchar2(1000 BYTE) column ''A'' , "Ca" varchar2(40 BYTE) column ''B'' , "Mg" varchar2(40 BYTE) column ''C'' , "Na" varchar2(40 BYTE) column ''D'' , "K" varchar2(40 BYTE) column ''E'' , "Fe" varchar2(40 BYTE) column ''F'' ' , 'A1:F13' ) ) t;

GHO65 commented 3 years ago

And a screenshot of the query result: QueryResultEmptyCells

mbleron commented 3 years ago

OK, thanks. Thought I fixed this a long time ago... Let me work on it, the bug should be fixed within the next few days.

mbleron commented 3 years ago

OK now in v5.1.1.

For the technical explanation : There are different kind of empty cells internally.

  1. Cell tag with no value, e.g. <c r="A1" s="1"/>
  2. Cell tag with a reference to an empty shared string, e.g. <c r="A1" t="s"><v>1</v></c>
  3. No cell tag

Yours were of the 2nd kind, and it was not handled correctly by the DOM reader.

GHO65 commented 3 years ago

Thanks! Nice job, as this module.