Closed MCRE-BE closed 3 weeks ago
This is a limitation of the pyxlsb library. Their docs note that
Do note that dates will appear as floats. You must use the convert_date(date) method from the pyxlsb module to turn them into datetime instances.
Also, you'll notice there's a marker that xfails datetime related tests with xlsb files - https://github.com/pandas-dev/pandas/blob/2a10e04a099d5f1633abcdfbb2dd9fdf09142f8d/pandas/tests/io/excel/test_readers.py#L153
@asishm : Thanks for the information. So it's a "feature" / "chosen limitation" (I don't know how to express this better) instead of an oversight that Pandas does convert float to dates for pyxlsb ? If it's the case, we can indeed close this issue as it's not a bug.
Pandas relies on other libraries to do most of the heavy lifting in parsing excel files. In case of xlsb files, pandas relies on pyxlsb
. It looks like results returned by the library don't differentiate between dates and floats (unlike other libraries like openpyxl for xlsx files). Therefore, pandas has no way to know which columns are date columns.
If you know which columns are dates, you can pass in pyxlsb's convert_date
function in the converters
parameter of pd.read_excel
(or do it after the pd.read_excel
call)
In [6]: df = pd.read_excel("./pandas/tests/io/data/excel/test1.xlsb", converters={0: pyxlsb.convert
...: _date})
In [7]: df
Out[7]:
Unnamed: 0 A B C D
0 2000-01-03 0.980269 3.685731 -0.364217 -1.159738
1 2000-01-04 1.047916 -0.041232 -0.161812 0.212549
2 2000-01-05 0.498581 0.731168 -0.537677 1.346270
3 2000-01-06 1.120202 1.567621 0.003641 0.675253
4 2000-01-07 -0.487094 0.571455 -1.611639 0.103469
5 2000-01-10 0.836649 0.246462 0.588543 1.062782
6 2000-01-11 -0.157161 1.340307 1.195778 -1.097007
I guess a note indicating this limitation might be good.
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
When trying to build some new tests, I found a strange behavior.
xlsb
andxlsm
directly in pandas if I'm not mistaken.Issue Description
Expected Behavior
All dataframes should be read the same
Installed Versions