Closed adriandelgado closed 1 year ago
This could be considered more than an inconvenience, in that case, maybe we should add a regression test to avoid this happening again.
I can't do what I have commented out. The variable names are in spanglish but I think the problem is clear.
That seems reasonable. Go ahead and submit a PR for that.
As an aside I also want to look at multithreading the worksheet writing during file saving:
https://github.com/jmcnamara/rust_xlsxwriter/blob/main/src/packager.rs#L105
You will see a discussion on this in #1.
I'll look at it initially with some of the standard/built in threading but is it worth considering rayon like you are using and introducing another dependency?
The advantage of using rayon is that the code would be easy to understand/maintain. It can be added as an optional dependency. Consider that using rayon for parallelism is very common in the Rust ecosystem so it is likely that if someone is looking to add more parallelism, they already are using rayon.
We can do a lot of things using only the standard library, we don't need rayon. One nice feature of rayon is that it automatically adapts to the number of cores available and It becomes single threaded if there's only one core available. All of this means it is a tradeoff between ease of development vs having less dependencies.
One thing I should add is that if we zip all the parts in parallel we can obtain HUGE improvements. This can be done (I think) using ZipWriter::new_append
.
I don't have much experience using the zip crate so I don't know if the new_append
can be used to run things in parallel. The other option would be to bypass the zip crate and use the flate2 crate directly. There's a PR that tried to add parallelism using rayon directly into the zip crate but it got closed because of inactivity: https://github.com/zip-rs/zip/pull/31.
Take a look at the potential parallelism in the app I'm currently working on:
Take a look at the potential parallelism in the app I'm currently working on:
Wow.
I'll definitely look into adding parallelism in the back end. I never bothered in the C version because of portability issues but it seems wrong not to use Rust's strengths in this area.
Thanks for #32!
The change to Rc
broke my pipelines:
I am generating four worksheets worksheets in parallel tasks (tokio multi-threaded) like this:
let mut workbook = rust_xlsxwriter::Workbook::new();
let column_header_format = Arc::new(Format::new().set_bold());
let (
worksheet_institutions_weekly,
worksheet_institutions_monthly,
worksheet_subscriptions_weekly,
worksheet_subscriptions_monthly,
) = tokio::try_join!(
tokio::spawn(institution::export_institution_activity_archive_weekly(
column_header_format.clone(),
pool.clone()
)),
tokio::spawn(institution::export_institution_activity_archive_monthly(
column_header_format.clone(),
pool.clone()
)),
tokio::spawn(subscription::export_subscription_activity_archive_weekly(
column_header_format.clone(),
pool.clone()
)),
tokio::spawn(subscription::export_subscription_activity_archive_monthly(
column_header_format.clone(),
pool.clone()
)),
)?;
workbook.push_worksheet(
worksheet_institutions_weekly
.wrap_err("failed to export weekly institution activity to report worksheet")?,
);
workbook.push_worksheet(
worksheet_institutions_monthly
.wrap_err("failed to export monthly institution activity to report worksheet")?,
);
workbook.push_worksheet({
// Set worksheet "subscription activity per week" as the active tab.
let mut worksheet = worksheet_subscriptions_weekly
.wrap_err("failed to export weekly subscription activity to report worksheet")?;
worksheet.set_active(true);
Ok::<_, Report>(worksheet)
}?);
workbook.push_worksheet(
worksheet_subscriptions_monthly
.wrap_err("failed to export monthly subscription activity to report worksheet")?,
);
let mut file_path = output_path.clone();
let file_name = Utc::now()
.with_timezone(&Amsterdam)
// TODO: File from arg / env!
.format("%G-W%V activity report.xlsx")
.to_string();
file_path.push(file_name);
workbook
.save(&file_path)
.wrap_err("failed to save report workbook file to disk")?;
info!("Workbook written to '{}'", file_path.display());
Can't wait for the 0.36 release :)
Can't wait for the 0.36 release :)
No problem. The fix is now available on crates.io. However, I messed up the release and had to release a patch release of v0.36.1.
Feature Request
In a previous PR (#27), I made the mistake to change the
String
s insideCellType
toBox<str>
andRc<str>
. However,Rc
is notSend
. This means that if we write worksheets on different theads, we can't send them to a single thread to useWorkbook::push_worksheet
. This shouldn't change the performance characteristics of the crate.