pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
28.67k stars 1.79k forks source link

`write_excel` treats all numeric data as 'Currency' format #17624

Closed AntonyCotton closed 1 month ago

AntonyCotton commented 1 month ago

Checks

Reproducible example

The following codes are copied from here.

##############################################################################
#
# A simple example of converting a Polars dataframe to an xlsx file with
# default formatting.
#
# SPDX-License-Identifier: BSD-2-Clause
# Copyright 2013-2024, John McNamara, jmcnamara@cpan.org
#
from datetime import date
import polars as pl

# Create a Pandas dataframe with some sample data.
df = pl.DataFrame(
    {
        "Dates": [date(2023, 1, 1), date(2023, 1, 2), date(2023, 1, 3)],
        "Strings": ["Alice", "Bob", "Carol"],
        "Numbers": [0.12345, 100, -99.523],
    }
)

# Write the dataframe to a new Excel file with autofit on.
df.write_excel(workbook="polars_format_default.xlsx", autofit=True)

Log output

No response

Issue description

Hello Polars Team,

I was using the write_excel method in Python Polars (v1.1.0). I noticed that all negative numbers in the sheet are displayed in red. Upon further investigation, it appears that all numeric data is incorrectly treated as 'Currency' format, which causes Excel to display negative numbers in red by default. This issue can also be observed in the figure in this page from XlsxWriter docs, where negative number -99.523 is displayed in red.

Thank you for your time and effort.

Expected behavior

Displaying negative numbers in red can be confusing, as it might suggest errors even when the numbers are correct. I propose that all data should be written in the 'General' format (like pandas).

Installed versions

``` --------Version info--------- Polars: 1.1.0 Index type: UInt32 Platform: Linux-6.1.0-21-amd64-x86_64-with-glibc2.36 Python: 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fastexcel: fsspec: gevent: great_tables: hvplot: matplotlib: nest_asyncio: 1.6.0 numpy: 1.26.4 openpyxl: 3.1.5 pandas: 2.2.2 pyarrow: 16.1.0 pydantic: 2.8.2 pyiceberg: sqlalchemy: torch: xlsx2csv: xlsxwriter: 3.2.0 ```
mcrumiller commented 1 month ago

I just ran your example, and the numeric format is #,##0.000;[Red]-#,##0.000. This is not currency format, it just shares the trait that negatives are shown in red.

AntonyCotton commented 1 month ago

Thank you for your prompt response. On my PC with Office LTSC Professional Plus 2021, numeric data is indeed treated as currency format.

efe3084c5edea9ec26a9e3b4ad2b5b33

Even if this isn't a currency format on others' PCs, my main concern is the font colour. I don't think it makes sense to have negative values shown in red by default, this should be left to the users to decide for themselves.

deanm0000 commented 1 month ago

Mine says custom whether I pick one of the cells or all of them.

image

AntonyCotton commented 1 month ago

OK...maybe it's an issue of Excel. But leaving the formats aside, I don't actually care if it's currency, custom or anything else. My main concern is that the text should not have a special colour (red) by default - this should be set by the user. Could you please consider if this is an appropriate suggestion? Thank you.

alexander-beedie commented 1 month ago

Thanks for the suggestion, but I disagree with this - we should have sensible minimal defaults, as these are useful/helpful for the majority of callers who don't want to have to individually customise every column of every export to get reasonably formatted output.

The Pandas default (not writing a real Table object, and not formatting anything) leaves the caller with a lot of work to get anything particularly pleasant to work with ("General" leaves dates looking like integers, and so on), whereas our chosen defaults give a clean/useful starting point that can be further customised as needed.

(And, as others have mentioned, this is not a "Currency" format; it is a standard numeric format that is constructed with reference to the value specified in the float_precision parameter, and is negative-value aware).

As for the colour being red, that is Excel's own standard formatting colour for negative values for all numeric formats (where it actually applies a colour), so this is the appropriate choice - any other colour would actually be non-standard.

alexander-beedie commented 1 month ago

However, if you do want everything to be exported using "General" format, it is already straightforward to do so 👍

df.write_excel(…, column_formats={cs.all(): "General"})

And if you only wanted "General" to apply to numeric values:

df.write_excel(…, column_formats={cs.numeric(): "General"})

(...where cs is the usual selectors^1 import)

AntonyCotton commented 1 month ago

Thank you for your response and explanation. I understand the standards and beliefs of Polars. Your code works well.

Inspired by your code, I finally choose to use the following code, which retains the custom format of Polars but remove only the colour:

df.write_excel(
    ...,
    column_formats={
        cs.float(): "#,##0.000;-#,##0.000",
        cs.integer(): "#,##0;-#,##0",
    },
)

By the way, the default red appears to have such a high priority that manually setting the colour in Excel does not take effect (this may be a feature of Excel but not an issue with Polars). screenshot