jmcnamara / rust_xlsxwriter

A Rust library for creating Excel XLSX files.
https://crates.io/crates/rust_xlsxwriter
Apache License 2.0
316 stars 25 forks source link

feature request: Change `Rc` to `Arc` #29

Closed adriandelgado closed 1 year ago

adriandelgado commented 1 year ago

Feature Request

In a previous PR (#27), I made the mistake to change the Strings inside CellType to Box<str> and Rc<str>. However, Rc is not Send. This means that if we write worksheets on different theads, we can't send them to a single thread to use Workbook::push_worksheet. This shouldn't change the performance characteristics of the crate.

adriandelgado commented 1 year ago

This could be considered more than an inconvenience, in that case, maybe we should add a regression test to avoid this happening again.

adriandelgado commented 1 year ago

image I can't do what I have commented out. The variable names are in spanglish but I think the problem is clear.

jmcnamara commented 1 year ago

That seems reasonable. Go ahead and submit a PR for that.

As an aside I also want to look at multithreading the worksheet writing during file saving:

https://github.com/jmcnamara/rust_xlsxwriter/blob/main/src/packager.rs#L105

You will see a discussion on this in #1.

I'll look at it initially with some of the standard/built in threading but is it worth considering rayon like you are using and introducing another dependency?

adriandelgado commented 1 year ago

The advantage of using rayon is that the code would be easy to understand/maintain. It can be added as an optional dependency. Consider that using rayon for parallelism is very common in the Rust ecosystem so it is likely that if someone is looking to add more parallelism, they already are using rayon.

We can do a lot of things using only the standard library, we don't need rayon. One nice feature of rayon is that it automatically adapts to the number of cores available and It becomes single threaded if there's only one core available. All of this means it is a tradeoff between ease of development vs having less dependencies.

One thing I should add is that if we zip all the parts in parallel we can obtain HUGE improvements. This can be done (I think) using ZipWriter::new_append.

I don't have much experience using the zip crate so I don't know if the new_append can be used to run things in parallel. The other option would be to bypass the zip crate and use the flate2 crate directly. There's a PR that tried to add parallelism using rayon directly into the zip crate but it got closed because of inactivity: https://github.com/zip-rs/zip/pull/31.

Take a look at the potential parallelism in the app I'm currently working on: image

jmcnamara commented 1 year ago

Take a look at the potential parallelism in the app I'm currently working on:

Wow.

I'll definitely look into adding parallelism in the back end. I never bothered in the C version because of portability issues but it seems wrong not to use Rust's strengths in this area.

LeoniePhiline commented 1 year ago

Thanks for #32!

The change to Rc broke my pipelines:

image

I am generating four worksheets worksheets in parallel tasks (tokio multi-threaded) like this:

    let mut workbook = rust_xlsxwriter::Workbook::new();
    let column_header_format = Arc::new(Format::new().set_bold());

    let (
        worksheet_institutions_weekly,
        worksheet_institutions_monthly,
        worksheet_subscriptions_weekly,
        worksheet_subscriptions_monthly,
    ) = tokio::try_join!(
        tokio::spawn(institution::export_institution_activity_archive_weekly(
            column_header_format.clone(),
            pool.clone()
        )),
        tokio::spawn(institution::export_institution_activity_archive_monthly(
            column_header_format.clone(),
            pool.clone()
        )),
        tokio::spawn(subscription::export_subscription_activity_archive_weekly(
            column_header_format.clone(),
            pool.clone()
        )),
        tokio::spawn(subscription::export_subscription_activity_archive_monthly(
            column_header_format.clone(),
            pool.clone()
        )),
    )?;

    workbook.push_worksheet(
        worksheet_institutions_weekly
            .wrap_err("failed to export weekly institution activity to report worksheet")?,
    );
    workbook.push_worksheet(
        worksheet_institutions_monthly
            .wrap_err("failed to export monthly institution activity to report worksheet")?,
    );
    workbook.push_worksheet({
        // Set worksheet "subscription activity per week" as the active tab.
        let mut worksheet = worksheet_subscriptions_weekly
            .wrap_err("failed to export weekly subscription activity to report worksheet")?;
        worksheet.set_active(true);
        Ok::<_, Report>(worksheet)
    }?);
    workbook.push_worksheet(
        worksheet_subscriptions_monthly
            .wrap_err("failed to export monthly subscription activity to report worksheet")?,
    );

    let mut file_path = output_path.clone();
    let file_name = Utc::now()
        .with_timezone(&Amsterdam)
        // TODO: File from arg / env!
        .format("%G-W%V activity report.xlsx")
        .to_string();

    file_path.push(file_name);

    workbook
        .save(&file_path)
        .wrap_err("failed to save report workbook file to disk")?;
    info!("Workbook written to '{}'", file_path.display());

Can't wait for the 0.36 release :)

jmcnamara commented 1 year ago

Can't wait for the 0.36 release :)

No problem. The fix is now available on crates.io. However, I messed up the release and had to release a patch release of v0.36.1.