jazzband / tablib

Python Module for Tabular Datasets in XLS, CSV, JSON, YAML, &c.
https://tablib.readthedocs.io/
MIT License
4.6k stars 592 forks source link

Excel file is wrongly parsed. Surely openpyxl bug #543

Closed daillouf closed 1 year ago

daillouf commented 1 year ago

Hello guys :)

This no doubt an openpyxl bug, but I figured it could be useful to open an issue here, just for reference :

file_bug.xlsx

In [56]: with open("file_bug.xlsx", "rb") as f:
    ...:     data = tablib.Dataset().load(f, format="xlsx")
    ...: 

In [57]: data.dict
Out[57]: 
[OrderedDict([('amount', 3480)]),
 OrderedDict([('amount', 200)]),
 OrderedDict([('amount', 3479.9999999999995)])]

In [58]: tablib.__version__
Out[58]: '3.3.0'

In [59]: openpyxl.__version__
Out[59]: '3.1.2'

As you can see, we get the classic base 2 python float bug


In [1]: 3480/100*100
Out[1]: 3479.9999999999995

But only for line 4 and not for line 2, and in excel its quite impossible to tell the difference.

Here is the python information


Python 3.10.9 (main, Dec 15 2022, 17:11:09) [Clang 14.0.0 (clang-1400.0.29.202)]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help.

My guess is something in excel data tells openpyxl to parse it as a float/percentage/whatever and triggers this bug.

daillouf commented 1 year ago

definitely openpyxl issue, or maybe even «Excel» issue

https://foss.heptapod.net/openpyxl/openpyxl/-/issues/1993 (I opened it, I used openpyxl directly and saw the same problem)

I believe this issue should remain open, then when the issue is fixed in openpyxl we can upgrade the version here.

But you do as you wish, your github, your rules ;)

claudep commented 1 year ago

Thanks for the report and your findings, however as there is nothing we can do at tablib level, I don't see the point in keeping it open.