willtrnr / pyxlsb

Excel 2007+ Binary Workbook (xlsb) reader for Python
GNU Lesser General Public License v3.0
90 stars 21 forks source link

open_workbook throws KeyError(None) when workbook contains sheet with rId being None #48

Open pallxk opened 1 year ago

pallxk commented 1 year ago

When a workbook contains sheet with rId being None, open_workbook throws KeyError(None).

Related code: https://github.com/willtrnr/pyxlsb/blob/7dca9403f61a71ca4ca53cad4330ce068cef4dbd/pyxlsb/workbook.py#L43-L47

And some debug output:

for item in reader:
  print(item)

(387, 'workbook')
(399, 'sheets')
(412, sheet(sheetId=28, rId=None, name=None))
(412, sheet(sheetId=2, rId='rId1', name='*'))
(412, sheet(sheetId=13, rId='rId2', name='*'))
(412, sheet(sheetId=15, rId='rId3', name='*'))
(412, sheet(sheetId=16, rId='rId4', name='*'))
(412, sheet(sheetId=23, rId='rId5', name='*'))
(412, sheet(sheetId=21, rId='rId6', name='*'))
(412, sheet(sheetId=22, rId='rId7', name='*'))
(412, sheet(sheetId=18, rId='rId8', name='*'))
(412, sheet(sheetId=25, rId='rId9', name='*'))
(412, sheet(sheetId=7, rId='rId10', name='*'))
(412, sheet(sheetId=24, rId='rId11', name='*'))
(412, sheet(sheetId=27, rId='rId12', name='*'))
(412, sheet(sheetId=11, rId='rId13', name='*'))
(412, sheet(sheetId=5, rId='rId14', name='*'))
(400, '/sheets')

(I've masked all sheet names with values as '*'.)

For now, I don't know how to recreate such a *.xlsb file from scratch, and can't share the file I have due to confidentiality reasons. Hopefully the debug output is enough for anyone interested to identify the problem.

willtrnr commented 1 year ago

Do you know where that workbook is coming from? Was it generated by Excel or some other application?

I'd have to recheck it, but I don't think this is valid per the MS-XLSB spec, so it could be a bug in the data reader or it could just genuinely be a corrupted file.

pallxk commented 1 year ago

I believe It's generated/created with Excel and edited with it but I'm not sure how it evolves into that state.

Opening the file with MS Excel will not report any error. I'm not familiar with MS-XLSB spec, but I fully understand your concern on this. I'm still trying to understand how an xlsb can be modified with Excel into such a state. I'll share once I've found out the reproducing steps or got files that I'm allowed to share.