jmcnamara / XlsxWriter

A Python module for creating Excel XLSX files.
https://xlsxwriter.readthedocs.io
BSD 2-Clause "Simplified" License
3.61k stars 630 forks source link

XlsxWriter Roadmap v2 #1028

Open jmcnamara opened 10 months ago

jmcnamara commented 10 months ago

Previous roadmap

XlsxWriter is almost 10 years old. The first version was released was in February 17 2013. According to pypinfo it has around 12 million monthly downloads so it is probably fair to say that it has been useful.

Recently I have been porting/rewriting XlsxWriter in Rust and it has been an interesting experience. When I'm finished with the Rust port, sometime near the end of 2024, I'd like to revisit XlsxWriter and bring it up to date with modern Python and practice. Some ideas:

max-muoto commented 7 months ago

@jmcnamara Would you consider having the next version just rely on the Rust version through pyo3 to realize performance benefits for the Python interface?

jmcnamara commented 7 months ago

Would you consider having the next version just rely on the Rust version through pyo3 to realize performance benefits for the Python interface?

@max-muoto

I don't think that would be practical from a maintenance point of view or desirable from an end user point of view. At the moment the Python version has zero dependencies and more functionality than the Rust version.

However, I would see scope for a "lite" version of XlsxWriter + pyo3 with support for just writing data and formatting. Something that could be consumed by Pandas, for example, to speed up file writing. From rough initial benchmarks that could be about 8x faster than the pure Python version. I see that Pandas recently adopted a Rust backed xlsx reader based on Calamine so they might be open to a similar writer. I'll keep it in mind.

max-muoto commented 7 months ago

Would you consider having the next version just rely on the Rust version through pyo3 to realize performance benefits for the Python interface?

@max-muoto

I don't think that would be practical from a maintenance point of view or desirable from an end user point of view. At the moment the Python version has zero dependencies and more functionality than the Rust version.

However, I would see scope for a "lite" version of XlsxWriter + pyo3 with support for just writing data and formatting. Something that could be consumed by Pandas, for example, to speed up file writing. From rough initial benchmarks that could be about 8x faster than the pure Python version. I see that Pandas recently adopted a Rust backed xlsx reader based on Calamine so they might be open to a similar writer. I'll keep it in mind.

Makes sense, thanks for the info!

I think a minimal version for compatibility with Polars/Pandas would be great. Polars also recently added support for Calamine as a reader, so I feel this is something that might be pretty open to as well.

jmcnamara commented 7 months ago

Polars also recently added support for Calamine as a reader, so I feel this is something that might be pretty open to as well.

That is good to know.

I think a minimal version for compatibility with Polars/Pandas would be great.

Polars could take the Rust version directly. I wrote polars_excel_writer as a prototype for that and there has been some initial engagement with the Polars folks here.

For Pandas I started a PYO3 wrapper called xlsxwriter_lite. However, that is currently very rudimentary.

alexander-beedie commented 6 months ago

Polars could take the Rust version directly. I wrote polars_excel_writer as a prototype for that and there has been some initial engagement with the Polars folks here.

We've been thinking about taking calamine as a direct Polars (Rust) dependency to squeeze every last possible drop of speed out of it; if/when we get around to that it might be time to revisit the writing side (though unless somebody suddenly gets a lot of unexpected free time this might take a while 😅)

jkyeung commented 3 months ago

Would you consider having the next version just rely on the Rust version through pyo3 to realize performance benefits for the Python interface?

I don't think that would be practical from a maintenance point of view or desirable from an end user point of view. At the moment the Python version has zero dependencies and more functionality than the Rust version.

Eventually, the Rust version may have equal or greater functionality than the Python version.

But I fully agree with @jmcnamara that having zero dependencies is desirable. In fact, it is a lifeline for those of us who want to use the full capabilities of XlsxWriter on systems with Python but no support for Rust. Even for systems that do support Rust, there will be some users who find the pure-Python XlsxWriter fast enough for their needs and would rather not introduce extra downloads or dependencies.