pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.01k stars 1.94k forks source link

write_excel() - support additional engines #14735

Open leonkosak opened 8 months ago

leonkosak commented 8 months ago

Description

especially rust_xlsxwriter would bring some enormous performence benefits. What do you think?

alexander-beedie commented 8 months ago

This would take significant work as our integration with xlsxwriter is quite deep (and would require integration on the Rust side as rust_xlsxwriter does not provide a Python wrapper). So, this would not be configurable, it would need to be a complete replacement, and would require sufficient feature parity to map from one to the other.

@jmcnamara: Do you have a rough idea about the current state of feature parity between the xlsxwriter Rust/Python versions? We have been considering taking calamine bindings directly inside Polars for even faster Excel reading, so it's not out of the question that we might also revisit taking a dependency on something lightweight for write bindings too 🤔

leonkosak commented 8 months ago

Well, as far as I know, the fastest way of writing xlsx files in Python is via PyExcelerate, which is not very performant as well (and library almost abandoned as I can see). The only library which has potential in my opinion for using in Python for creating xlsx is rust_xlsxwriter by @jmcnamara. 👍

jmcnamara commented 8 months ago

Do you have a rough idea about the current state of feature parity between the xlsxwriter Rust/Python versions?

I plan to have full feature completeness by the end of the year. Based on the completed feature list of the rust_xlsxwriter Roadmap it is currently at 26/36 (~70%) complete. Based on ported tests it is ~1000/1600 (63%) complete.

However in terms of the functionality of polars.DataFrame.write_excel() rust_xlswriter is almost feature complete with XlsxWriter. The only feature missing is Sparklines.

To help this along I can do 1-2 things:

  1. Port Sparklines to rust_xlsxwriter so that it is feature complete with the XlsxWriter functionality of polars.DataFrame.write_excel(). And/Or:
  2. Focus on completing features of polars_excel_writer to make that more API compatible with polars.DataFrame.write_excel(). This would make it easier for you and the other Polars devs to drop it in as a replacement for the Python version.

I'll probably work on item 1 anyway but let me know what you think would be the best approach. I'll willing to put some time into making rust_xlswriter as compatible as possible with Polars.

alexander-beedie commented 8 months ago

I'll probably work on item 1 anyway but let me know what you think would be the best approach. I'll willing to put some time into making rust_xlswriter as compatible as possible with Polars.

Given that I'd have to replicate the write_excel API anyway, and it seems you have done most of the work already, I think it would be a great idea to take advantage of it. If you could get Sparklines in and we could actually port the internals wholesale to make use of polars_excel_writer instead, that sounds quite compelling to me (I can poll the other devs for their thoughts, but I like the sound of it :)

jmcnamara commented 8 months ago

@alexander-beedie Sounds good. Let's stay in sync. I'll get the sparklines support ported by the weekend and I'll follow up then.

alexander-beedie commented 8 months ago

@alexander-beedie Sounds good. Let's stay in sync. I'll get the sparklines support ported by the weekend and I'll follow up then.

Great, though don't rush on my account; I'm swamped at the moment 😅

leonkosak commented 6 months ago

hi, Any updates on this topic? :)

jmcnamara commented 6 months ago

Any updates on this topic?

I implemented sparklines in rust_xlsxwriter put didn't get a chance to port it to polars_excel_writer. I haven't had much open source development time recently so I had to park it for a while. I hope to get started again in May.

For what it is worth here is the currently feature completion list for polars_excel_writer: https://github.com/jmcnamara/polars_excel_writer/issues/1#issuecomment-1685299464

leonkosak commented 6 months ago

No problem. THank you for your great work! 👍

aleewen commented 4 months ago

Right now write_excel allows the user to pass in an open xlsxwriter.Workbook object. Any chance that this can support openpyxl workbooks too?

alexander-beedie commented 4 months ago

Right now write_excel allows the user to pass in an open xlsxwriter.Workbook object. Any chance that this can support openpyxl workbooks too?

Afraid not; they are entirely incompatible. You can write to a fresh workbook or to an open xlsxwriter workbook (which allows you to write multiple frames to the same workbook, or enrich it with charts/etc), but you can't mix & match with different libraries.