jmcnamara / polars_excel_writer

A Polars extension to serialize dataframes to Excel xlsx files
Apache License 2.0
12 stars 4 forks source link

Roadmap #1

Open jmcnamara opened 1 year ago

jmcnamara commented 1 year ago

The polars_excel_writer crate is a library for serializing Polars dataframes to Excel Xlsx files.

It provides two interfaces for writing a dataframe to an Excel Xlsx file:

ExcelWriter uses PolarsXlsxWriter to do the Excel serialization which in turn uses the rust_xlsxwriter crate.

Example

An example of writing a Polar Rust dataframe to an Excel file using the ExcelWriter interface.

use chrono::prelude::*;
use polars::prelude::*;

fn main() {
    // Create a sample dataframe for the example.
    let mut df: DataFrame = df!(
        "String" => &["North", "South", "East", "West"],
        "Integer" => &[1, 2, 3, 4],
        "Float" => &[4.0, 5.0, 6.0, 7.0],
        "Time" => &[
            NaiveTime::from_hms_milli_opt(2, 59, 3, 456).unwrap(),
            NaiveTime::from_hms_milli_opt(2, 59, 3, 456).unwrap(),
            NaiveTime::from_hms_milli_opt(2, 59, 3, 456).unwrap(),
            NaiveTime::from_hms_milli_opt(2, 59, 3, 456).unwrap(),
            ],
        "Date" => &[
            NaiveDate::from_ymd_opt(2022, 1, 1).unwrap(),
            NaiveDate::from_ymd_opt(2022, 1, 2).unwrap(),
            NaiveDate::from_ymd_opt(2022, 1, 3).unwrap(),
            NaiveDate::from_ymd_opt(2022, 1, 4).unwrap(),
            ],
        "Datetime" => &[
            NaiveDate::from_ymd_opt(2022, 1, 1).unwrap().and_hms_opt(1, 0, 0).unwrap(),
            NaiveDate::from_ymd_opt(2022, 1, 2).unwrap().and_hms_opt(2, 0, 0).unwrap(),
            NaiveDate::from_ymd_opt(2022, 1, 3).unwrap().and_hms_opt(3, 0, 0).unwrap(),
            NaiveDate::from_ymd_opt(2022, 1, 4).unwrap().and_hms_opt(4, 0, 0).unwrap(),
        ],
    )
    .unwrap();

    example1(&mut df).unwrap();
    example2(&df).unwrap();
}

// The ExcelWriter interface.
use polars_excel_writer::ExcelWriter;

fn example1(df: &mut DataFrame) -> PolarsResult<()> {
    let mut file = std::fs::File::create("dataframe.xlsx").unwrap();

    ExcelWriter::new(&mut file).finish(df)
}

// The PolarsXlsxWriter interface. For this simple case it is similar to the
// ExcelWriter interface but it has additional options to support more complex
// use cases.
use polars_excel_writer::PolarsXlsxWriter;

fn example2(df: &DataFrame) -> PolarsResult<()> {
    let mut writer = PolarsXlsxWriter::new();

    writer.write_dataframe(df)?;
    writer.write_excel("dataframe2.xlsx")?;

    Ok(())
}

Second output file (same as the first):

Performance

The table below shows the performance of writing a dataframe using Python Polars, Python Pandas and PolarsXlsxWriter.

The tested configurations were:

Note: The performance was tested for the dataframe writing code only. The code used to create the dataframes was omitted from the test results.

[perf_test.py]: https://github.com/jmcnamara/polars_excel_writer/blob/main/examples/perf_test.py [perf_test.rs]: https://github.com/jmcnamara/polars_excel_writer/blob/main/examples/perf_test.rs [to_excel()]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html

write_excel(): https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.DataFrame.write_excel.html#polars.DataFrame.write_excel

jmcnamara commented 12 months ago

Todo

ExcelWriter: This is mainly done. The interface is meant to be simple/minimal and most of the applicable CvsWriter features are already implemented.

The features of Polars Python ExcelWriter are shown below along with their equivalent polars_excel_writer APIs and level of completeness: