jmcnamara / XlsxWriter

A Python module for creating Excel XLSX files.
https://xlsxwriter.readthedocs.io
BSD 2-Clause "Simplified" License
3.66k stars 631 forks source link

How do I preserve formatting when writing a dataframme to an XLS file #1034

Closed wayner9 closed 11 months ago

wayner9 commented 11 months ago

Question

I asked this question on Stack Overflow, but I am not getting any bites.

See here: https://stackoverflow.com/questions/77603992/storing-a-dataframe-to-excel-as-numbers-with-percent-or-other-formatting

When I save a df to excel that contains elements like 20.3% it is being written as text, even when I use the engine_kwargs={'options': {'strings_to_numbers': True}} argument. Is there any way to have these be imported into excel as numbers with a 0.0% formatting style? See the Stack Overflow link for more details.

I can also send the data to excel write as floats, as in 0.203, but then I have to do formatting in Excel, when I have already done the proper formatting in Python/Pandas.

jmcnamara commented 11 months ago

You will need to do a few steps to convert the percentage string to a percentage number in Excel:

The following example from the XlsxWriter docs demonstrates this for a dataframe:

https://xlsxwriter.readthedocs.io/example_pandas_percentage.html


import pandas as pd

# Create a Pandas dataframe from some data.
df = pd.DataFrame({"Names": ["Anna", "Arek", "Arun"], "Grade": ["100%", "70%", "85%"]})

# Convert the percentage strings to percentage numbers.
df["Grade"] = df["Grade"].str.replace("%", "")
df["Grade"] = df["Grade"].astype(float)
df["Grade"] = df["Grade"].div(100)

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter("pandas_percent.xlsx", engine="xlsxwriter")

# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name="Sheet1")

# Get the xlsxwriter workbook and worksheet objects.
workbook = writer.book
worksheet = writer.sheets["Sheet1"]

# Add a percent number format.
percent_format = workbook.add_format({"num_format": "0%"})

# Apply the number format to Grade column.
worksheet.set_column(2, 2, None, percent_format)

# Close the Pandas Excel writer and output the Excel file.
writer.close()

image

wayner9 commented 11 months ago

Thanks, so it looks like I have to set a number format for an entire column, using set_column. So to set the format for columns B and C I would use set_column(1,2,None,percent_format). It looks like I could also do this for a row with set_row(32,None,percent_format).

It looks like the set_row command only operates on a single row, whereas set_column operates on a range of columns. Is that correct?

Is there a set_range command where I could set a range of cells to a specific format, as in set_range(A1,D5,percent_format)?

Or is the only option to set formatting for an entire row or column?

And if I were to do multiple format calls on both columns and rows would the latest set_row (or set_column) be the formatting used if a used a set_column call to set one format and then a set_row to set another format?

Below is the dataframe that I am bringing into excel. I want columns 4 and 5 to be percentage with 0 decimals (0% in excel formatting custom code nomenclature). I want all other columns to be percentage with one decimal (0.0%). I want row ten to be general format with two decimal places (0.00).

So the way to do this would I would do something like this:

set_column(1,15,None,percent_onedec_format)
set_column(3,4,None,percent_zerodec_format)
set_row(9,None,twodec_format)

Does that make sense?

       ExpRet    Vol Min   Max  Iter1  Iter2  Iter3  Iter4  Iter5   Iter6   Iter7   Iter8   Iter9  Iter10
FGE      5.7%  16.0%  0%  100%   0.0%   1.8%   3.0%  12.2%  30.4%   43.8%   50.0%   50.0%   50.0%   50.0%
FFI      4.8%   6.0%  0%  100%   0.0%  10.5%  20.3%  33.1%  19.5%    6.2%    0.0%    0.0%    0.0%    0.0%
FRA      4.5%  10.0%  0%   25%   0.2%   0.9%   1.3%   4.7%   0.0%    0.0%    0.0%    0.0%    0.0%    0.0%
FAR      6.5%   7.0%  0%   25%   0.4%  15.2%  25.0%  25.0%  25.0%   25.0%   25.0%   25.0%   25.0%   25.0%
FCO      7.0%   8.0%  0%   25%   0.1%  10.0%  19.3%  25.0%  25.0%   25.0%   25.0%   25.0%   25.0%   25.0%
CASH     2.5%   0.5%  0%  100%  99.3%  61.7%  31.1%   0.0%   0.0%    0.0%    0.0%    0.0%    0.0%    0.0%
Return                           2.5%   3.8%   4.9%   5.8%   6.0%    6.1%    6.2%    6.2%    6.2%    6.2%
Vol                              0.5%   2.0%   3.6%   5.1%   6.7%    8.2%    9.0%    9.0%    9.0%    9.0%
Sharpe                           0.05   0.67   0.68   0.65   0.53    0.45    0.41    0.41    0.41    0.41
Ploss                            0.0%   3.0%   8.5%  12.8%  18.4%   22.8%   24.6%   24.6%   24.6%   24.6%
Var05                            1.7%   0.5%  -1.0%  -2.6%  -5.0%   -7.4%   -8.7%   -8.7%   -8.7%   -8.7%
Var01                            1.3%  -0.9%  -3.4%  -6.1%  -9.6%  -13.0%  -14.8%  -14.8%  -14.8%  -14.8%
jmcnamara commented 11 months ago

It looks like the set_row command only operates on a single row, whereas set_column operates on a range of columns. Is that correct?

Yes.

Is there a set_range command where I could set a range of cells to a specific format, as in set_range(A1,D5,percent_format)?

No, unfortunately note. There is conditional formatting that can be applied to a range but I wouldn't use it for this.

Or is the only option to set formatting for an entire row or column?

Yes. Unless, you want to want to iterate over each column of the dataframe and write them out individually using XlsxWriter directly and skip the to_excel() step.

And if I were to do multiple format calls on both columns and rows would the latest set_row (or set_column) be the formatting used if a used a set_column call to set one format and then a set_row to set another format?

Row and Column formats don't merge automatically so in that case you would need to do individual cell formatting.

So the way to do this would I would do something like this:

Yes, that looks right.