pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.9k stars 17.67k forks source link

BUG: date_format and datetime_format arguments to ExcelWriter are currently ignored by default openpyxl engine #44284

Open gaibo opened 2 years ago

gaibo commented 2 years ago

Reproducible Example

import pandas as pd
with pd.ExcelWriter('TEST.xlsx', datetime_format='YYYY-MM-DD') as writer:
    test_df = pd.DataFrame([9000]*10, index=[pd.Timestamp('2021-11-02')]*10)
    test_df.to_excel(writer)

Issue Description

In pandas version 0.24.2, the above example would output nicely formatted dates. In pandas version 1.3.4, it outputs YYYY-MM-DD HH:MM:SS format, ignoring the datetime_format field. This is probably as simple as openpyxl becoming the new default engine but not being configured for date formatting.

Expected Behavior

In written Excel sheet, YYYY-MM-DD format dates instead of YYYY-MM-DD HH:MM:SS format dates.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 73c68257545b5f8530b7044f56647bd2db92e2ba python : 3.9.7.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19042 machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252 pandas : 1.3.3 numpy : 1.20.3 pytz : 2021.1 dateutil : 2.8.2 pip : 21.2.4 setuptools : 58.0.4 Cython : None pytest : 6.2.4 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : 1.3.2 fsspec : None fastparquet : None gcsfs : None matplotlib : 3.4.3 numexpr : 2.7.3 odfpy : None openpyxl : 3.0.9 pandas_gbq : None pyarrow : None pyxlsb : None s3fs : None scipy : 1.7.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None
asishm commented 2 years ago

This is probably as simple as openpyxl becoming the new default engine but not being configured for date formatting.

The default engine for writing hasn't changed - it's still xlsxwriter, but in this case you don't have xlsxwriter so it uses openpyxl. If you install xlsxwriter, you'll see the previous behavior.

as for the issue - it seems to be because these (date_format, datetime_format) args are not being propagated to the base class

https://github.com/pandas-dev/pandas/blob/2e29e1172bb5d17c5d6f4d8bec1d3e6452091822/pandas/io/excel/_openpyxl.py#L31-L58

gaibo commented 2 years ago

The default engine for writing hasn't changed - it's still xlsxwriter, but in this case you don't have xlsxwriter so it uses openpyxl. If you install xlsxwriter, you'll see the previous behavior.

Hey thanks for the quick response! You are absolutely right, I somehow missed the following in ExcelWriter class while glancing at source yesterday: `Default is to use :

sebastian-de commented 2 years ago

The above example works fine for me until v1.2.5. Since v1.3.0 date_format and datetime_format are ignored. I suspect this is due to PR https://github.com/pandas-dev/pandas/pull/40430 In 1.3.x, XlsxWriter super().init() has

            date_format=date_format,
            datetime_format=datetime_format,

which is missing in OpenpyxlWriter super().init() I'm off to a short holiday, but next week I'll try if adding these lines to OpenpyxlWriter fixes the problem.

driedwater commented 1 year ago

take

TimoMorris commented 1 year ago

@driedwater feel free to have a look at my PR as a starting point if you like: https://github.com/pandas-dev/pandas/pull/47315 it was all working, had just dragged on a bit waiting for code reviews and by the time the latest one came back I was too busy to sort it and then haven't had a chance to pick it up again.

driedwater commented 1 year ago

Sure, I'll take a look at it.

CDWimmer commented 6 months ago

did this ever get anywhere?

khuseynov-daymon commented 4 months ago

Any Reason why this issue is still being ignored? nobody use openpyxl ?

CDWimmer commented 4 months ago

Plenty do use it! @driedwater @TimoMorris Don't suppose you'd have time to round this one up now? Looks like you're still the last ones to have given it a go unless I'm being blind

asishm commented 4 months ago

As a workaround, you can use writer = pd.Excelwriter(..., engine='openpyxl') and override the default date_format and datetime_format attributes using writer.date_format = '...'. For a, fix https://github.com/pandas-dev/pandas/pull/40430 was nearly there if someone wants to take it up!

ma-ef commented 2 months ago

@asishm Unfortunately this does not work: Traceback (most recent call last): writer.date_format= "DD.MM.YYYY" ^^^^^^^^^^^^^^^^^^ AttributeError: property 'date_format' of 'OpenpyxlWriter' object has no setter

asishm commented 2 months ago

Ah you're right. These attributes were changed to properties without setters in 1.5.0. While not recommended, you can set writer._date_format and writer._datetime_format instead. Note that since these use attributes that are not public, it's not guaranteed to work in the future and they may get changed/removed at any time.