pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.16k stars 1.94k forks source link

`write_excel(formulas={})` appears non-functional in latest version #18782

Open zach-hunt opened 1 month ago

zach-hunt commented 1 month ago

Checks

Reproducible example

import polars as pl
print(pl.__version__)  # 1.7.1

df = pl.DataFrame([1], schema=('A'))
formula = {'B': {'formula': '=[@A]+1'
                 ,'insert_after': 'A'
                 ,'return_dtype': pl.Int64}}
df.write_excel('./FormulaTest.xlsx', formulas=formula)  # Functional in 1.2.1, Non-Functional in 1.3.0+

Log output

No error

Issue description

df.write_excel creates the Excel file with columns A, B, but column B is empty in V1.3.0+: (Introduced before v1.0: https://github.com/pola-rs/polars/pull/7871)

1.2.1

1-2-1

1.3.0 - 1.7.1

1-7-1

Expected behavior

I expect the newest version to assign column B with the formula as it did back in 1.2.1. Thanks!

Installed versions

``` --------Version info--------- Polars: 1.7.1 Index type: UInt32 Platform: Windows-10-10.0.19045-SP0 Python: 3.12.5 | packaged by conda-forge | (main, Aug 8 2024, 18:24:51) [MSC v.1940 64 bit (AMD64)] ----Optional dependencies---- adbc_driver_manager altair cloudpickle connectorx deltalake fastexcel fsspec gevent great_tables matplotlib nest_asyncio 1.6.0 numpy 2.0.1 openpyxl pandas pyarrow pydantic pyiceberg sqlalchemy torch xlsx2csv xlsxwriter 3.1.1 ```
zach-hunt commented 1 month ago

I was able to confirm that the same issue occurs in a new environment, following a machine restart. Tested 0.20.23, 1.2.1, 1.3.0, 1.4.0, and 1.7.1, same results on another Windows 10 machine.

laxmarshall9 commented 4 weeks ago

Hey I just encountered this same issue with the pl.read_excel() function. I suspect it has to do with the default "calamine" engine. When I change the engine to "openpyxl" it works just fine but it runs much slower, negating the speed-benefit from using polars.

Oddly enough, this same behavior happens with pl.read_excel() even when a column containing an excel formula is pulled from an entirely separate excel workbook using PowerQuery AND a step is applied to change the data type to text. You think this would bypass the issue since the PowerQuery removes the excel formula and replaces it with plain text, but for some reason the same result occurs; calamine can't read the cell. It must be something in the metadata of the cell which causes the issue.