Open lrisch opened 1 year ago
set_column does not change the format; it changes the default format:
The cell_format parameter will be applied to any cells in the column that don’t have a format.
Since pandas is setting the format for date cells, it has no effect. As far as I can tell there is no easy way to overwrite an existing format with xlsxwriter. The best resolution I see for this within pandas is to use datetime_format as you mentioned.
So here if we are looking to change any format of date column, by using strftime function we could make the possible changes to the date values.
@Pavanmahaveer7 - I think you are saying to modify the format prior calling pd.write_excel
. Yes, that is an option, but then I believe the Excel file will not have the Date datatype.
@rhshadrach Okay thanks for that information, I didn't know it only changed the default format. In that case we will keep using datetime_format 👍
Currently the default value of datetime_formatstr
is None
, which gives you the format "YYYY-MM-DD HH:MM:SS"
. We could instead make the default value be ""YYYY-MM-DD HH:MM:SS"
and then if None
is passed, apply no formatting. It would allow you to then apply your own formatting with xlsxwriter. However, without applying any formatting I think the values would then appear as integers in the Excel sheet - and this would perhaps cause confusion.
I still think a better resolution is to have a way to override the existing format via xlsxwriter.
@rhshadrach Im sorry, I dont really understand your last message. As I understand it, I cannot use set_column because pandas is overwriting whatever I set as a format there. Therefore I have to use datetime_format (which is fine, it would have been nicer to be able to use set_column since we use it for the other datatypes to, but it is ok as a workaround). But I dont understand, why do you want to change the default of datetime_format? And what do you mean by this: "I still think a better resolution is to have a way to override the existing format via xlsxwriter."
As I understand it, I cannot use set_column because pandas is overwriting whatever I set as a format there.
Depending on interpretation, I think this is not correct - in particular pandas is not overwriting. pandas sets the format as "YYYY-MM-DD HH:MM:SS"
first and then you are calling set_column
. The issue is that calling set_column
does nothing because it only modifies the default format. Modifying the default format has no impact because pandas has set a (non-default) format.
If users could tell pandas to not set any format for datetime (my proposal above - which I do not like but it is an option), then changing the default format would have an impact and so your call to set_column
would have the desired effect.
It seems better to me if xlsxwriter gave you the ability to modify the format on an entire column, and not just the default format. I believe xlsxwriter has no such function.
Perhaps my use of "default format" is confusing and I should be saying "no format" instead.
Looking into this again, I'm now understanding that Excel has cell, row, and column formats. The set_column
sets the column format, but priority is given to any cell in the column that has a cell format set.
Also, ExcelWriter
does have date_format
and datetime_format
arguments that can be set!
@jmcnamara - I was wondering if you could confirm my understanding here is correct. I believe this comment is self-contained, so hopefully you don't have to wade through the above comments :smile:.
pandas writes dates and datetimes by applying a cell format to particular cells. This makes xlxswriter's set_column
(and set_row
) ineffective on such cells - perhaps surprisingly until you understand how Excel has different formatters for rows, columns, and cells.
I was wondering if pandas should do something different here - but it appears there is no "default date format" in Excel. XlsxWriter does have a default_date_format
, but it appears this is implemented by setting the cell format, similar to pandas:
So users wanting to change the format of date and datetime columns that already have a format applied just need to loop over the cells in a column.
I was wondering if you could confirm my understanding here is correct.
Yes that is correct. To reiterate: Pandas applies a cell format when writing a datetime and that cell format can't/won't be overwritten by a row or column format.
@rhshadrach I think the proposal to have change the behaviour of datetime_format=None
and date_format=None
in pd.ExcelWriter()
so that they don't apply a default format is a good one. That would allow the user to apply a format via set_column()
. In most cases applying the number format via datetime_format
and date_format
is sufficient for most users but sometimes they need to also apply font or border formatting as well and then the ability to turn off the Pandas formatting and use set_column()
instead would be useful.
For reference here are the XlsxWriter docs where I try to point people in the right direction on Formatting of the Dataframe output.
Thanks @jmcnamara!
@rhshadrach I think the proposal to have change the behaviour of
datetime_format=None
anddate_format=None
inpd.ExcelWriter()
so that they don't apply a default format is a good one.
Makes sense - but I would think just not as the default. I'll put up a PR for this and improve some of the docs.
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
I created an Excel sheet using
pd.to_excel()
. The data contains one column with a float number and one column with adatetime.datetime
object.Then, I want to set the column type using the
num_format
parameter inset_column
.This works for the float value, which I want in the percentage format. However, it does not work for the datetime value. The value is in the wrong format "01.01.2020 00:00:00" (German date). The value in the Excel file is in the same wrong format, whether I run
set_column()
for the date column or not.I have tried to circle in the problem by trying other variations, which turns out that they do work. However, those would be only workarounds for our use case.
Using the parameter
datetime_format='dd.mm.yyyy'
in thepd.ExcelWriter
constructor puts the date into the right format.Instead of writing the sheet from a DataFrame, using the
xlsxwriter
functions works fine:workbook = xlsxwriter.Workbook('datetime_bug.xlsx') worksheet = workbook.add_worksheet()
using the xlsxwriter functions works as intended
worksheet.write('A1', 0.05) worksheet.write('B1', datetime(year=2020, month=1, day=1))
percentage_format = workbook.add_format({'num_format':'0.00%'}) date_format = workbook.add_format({'num_format':'dd.mm.yyyy'})
worksheet.set_column('A:A', width=None, cell_format=percentage_format) worksheet.set_column('B:B', width=None, cell_format=date_format)
workbook.close()