mbleron / ExcelTable

ExcelTable is an Oracle SQL interface to read Microsoft Excel files (.xlsx, .xlsm, .xlsb, .xls, .xml), ODF spreadsheet files (.ods) and flat files as external tables
MIT License
45 stars 12 forks source link

ExcelTable with xls file #34

Closed madhuridara closed 2 years ago

madhuridara commented 3 years ago

Hi Marc,

Attached is the actual file I am trying to read with ExcelTable. This looks similar to the error that you fixed last time. It's with RT_CONTINUE.

This is the command I used to read the file.

SELECT t.* FROM Table( ExcelTable.getRows( exceltable.getFile('TEST','19INR0073_RENEWABLE_FORM.xls') ,ExcelTableSheetList('Unit Info - RENEWABLE') ,' "COL1" varchar2(200 char) , "COL2" varchar2(1 char) , "COL3" varchar2(200 char) , "COL4" varchar2(200 char) , "COL5" varchar2(200 char) , "COL6" varchar2(200 char) , "COL7" varchar2(200 char) , "COL8" varchar2(200 char) , "COL9" varchar2(200 char) , "COL10" date' , '9:10',1 ) ) t ;

I am getting this error stack:

ORA-20731: Error at position 969133, expecting a [Continue] record ORA-06512: at "RARFSTG2.XUTL_XLS", line 263 ORA-06512: at "RARFSTG2.XUTL_XLS", line 283 ORA-06512: at "RARFSTG2.XUTL_XLS", line 479 ORA-06512: at "RARFSTG2.XUTL_XLS", line 696 ORA-06512: at "RARFSTG2.XUTL_XLS", line 714 ORA-06512: at "RARFSTG2.XUTL_XLS", line 1012 ORA-06512: at "RARFSTG2.XUTL_XLS", line 1408 ORA-06512: at "RARFSTG2.XUTL_XLS", line 1444 ORA-06512: at "RARFSTG2.EXCELTABLE", line 2812 ORA-06512: at "RARFSTG2.EXCELTABLE", line 3164 ORA-06512: at "RARFSTG2.EXCELTABLE", line 4019 ORA-06512: at "RARFSTG2.EXCELTABLEIMPL", line 124 ORA-06512: at line 1

mbleron commented 3 years ago

I don't see the file. Remember you have to attach it directly on the issue, not via email. Thanks.

madhuridara commented 3 years ago

19INR0073_RENEWABLE_FORM.xls

Marc, Attached the file again. Can you please look into it. Thank You!

mbleron commented 3 years ago

Great, thanks. Looking at it right now...

mbleron commented 3 years ago

OK, looks like bad data in the SHAREDSTRINGS (SST) part.

In Excel, character strings are shared across the workbook, so that a given string is stored only once even though it may appear multiple times in different cells. Technically, strings are stored in a SST record, whose first two fields are :

Problem : In your file, cstTotal = cstUnique = 2115, which is wrong because there's only 1496 unique strings. Since my code relies on cstUnique to loop through the SST array, it continues reading the stream past the actual end of the array, and eventually find unexpected content.

Opening and re-saving the file solves the issue, as cstUnique is now set correctly. Does the file come from a 3rd-party generator? Best way would be to fix it, if possible.

In the meantime, I'll see if I can implement a fallback mechanism for this particular case.

mbleron commented 3 years ago

Workaround implemented in v5.2.2

warobayo commented 2 years ago

Hi @mbleron when you publish the version 5.2.2..?

Thanks.

image

mbleron commented 2 years ago

Hi, 5.2.2 is the current version. Maybe you're hitting a different issue. Could you attach a sample file here so that I can reproduce the problem? Thanks.

mbleron commented 2 years ago

@warobayo Given the stack trace, it's indeed a different issue. Could you please open a new issue and attach a sample file there? I'd be glad to look at it and eventually fix the issue if I can.