Closed han1548772930 closed 1 year ago
I tried using multithreading to handle this, but I found it to be slower than single threading.
use std::{
sync::{Arc, Mutex},
thread,
time::{SystemTime, UNIX_EPOCH},
};
use rust_xlsxwriter::*;
fn main() {
let workbook = Workbook::new();
let workbook_arc = Arc::new(Mutex::new(workbook));
workbook_arc.lock().unwrap().add_worksheet();
let mut time = timestamp1();
println!("start{:?}", time);
let mut handles = vec![];
for i in 1..6 {
let workbook_clone = workbook_arc.clone();
let handle = thread::spawn(move || {
let mut workbook = workbook_clone.lock().unwrap();
let sheet = workbook.worksheet_from_index(0).unwrap();
for j in (i - 1) * 209715..i * 209715 {
sheet.write_string(j, 1, "Hello, World!").unwrap();
sheet.write_string(j, 2, "Hello, World!").unwrap();
sheet.write_string(j, 3, "Hello, World!").unwrap();
sheet.write_string(j, 4, "Hello, World!").unwrap();
sheet.write_string(j, 5, "Hello, World!").unwrap();
sheet.write_string(j, 6, "Hello, World!").unwrap();
sheet.write_string(j, 7, "Hello, World!").unwrap();
sheet.write_string(j, 8, "Hello, World!").unwrap();
sheet.write_string(j, 9, "Hello, World!").unwrap();
sheet.write_string(j, 10, "Hello, World!").unwrap();
sheet.write_string(j, 11, "Hello, World!").unwrap();
sheet.write_string(j, 12, "Hello, World!").unwrap();
}
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}
let mut workbook = workbook_arc.lock().unwrap();
workbook.save("demo.xlsx").unwrap();
time = timestamp1();
println!("end{:?}", time);
}
fn timestamp1() -> i64 {
let start = SystemTime::now();
let since_the_epoch = start
.duration_since(UNIX_EPOCH)
.expect("Time went backwards");
let ms = since_the_epoch.as_secs() as i64 * 1000i64
+ (since_the_epoch.subsec_nanos() as f64 / 1_000_000.0) as i64;
ms
}
I’ll add multi-threading into the back end in the next release +1 or +2 release.
The library is probably IO bound rather than CPU bound so multi-threading may not have a linear benefit. Nonetheless I’ll implement it to get whatever possible benefit.
@adriandelgado Any suggestions to the OP on multi-threading in the front end/user app?
Multitheading is only useful for massive Worksheets.
I also recommend not using a Mutex. You can generate each Worksheet on a separate thread and then join together using push_worksheet
.
I tried using these two methods and still got something similar to single threading.
fn main() {
let workbook = Workbook::new();
let workbook_arc = Arc::new(Mutex::new(workbook));
// workbook_arc.lock().unwrap().add_worksheet();
// workbook_arc.lock().unwrap().add_worksheet();
// workbook_arc.lock().unwrap().add_worksheet();
// workbook_arc.lock().unwrap().add_worksheet();
let mut time = timestamp1();
println!("start{:?}", time);
let mut handles = vec![];
for i in 0..4 {
let workbook_clone = workbook_arc.clone();
let handle = thread::spawn(move || {
let mut workbook = workbook_clone.lock().unwrap();
// let sheet: &mut Worksheet = workbook.worksheet_from_index(i).unwrap();
let mut sheet: Worksheet=Worksheet::new();
for j in 0..1048576 {
sheet.write_string(j, 1, "Hello, World!").unwrap();
sheet.write_string(j, 2, "Hello, World!").unwrap();
sheet.write_string(j, 3, "Hello, World!").unwrap();
sheet.write_string(j, 4, "Hello, World!").unwrap();
sheet.write_string(j, 5, "Hello, World!").unwrap();
sheet.write_string(j, 6, "Hello, World!").unwrap();
sheet.write_string(j, 7, "Hello, World!").unwrap();
sheet.write_string(j, 8, "Hello, World!").unwrap();
sheet.write_string(j, 9, "Hello, World!").unwrap();
sheet.write_string(j, 10, "Hello, World!").unwrap();
sheet.write_string(j, 11, "Hello, World!").unwrap();
sheet.write_string(j, 12, "Hello, World!").unwrap();
}
workbook.push_worksheet(sheet);
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}
let mut workbook = workbook_arc.lock().unwrap();
workbook.save("demo.xlsx").unwrap();
time = timestamp1();
println!("end{:?}", time);
}
fn timestamp1() -> i64 {
let start = SystemTime::now();
let since_the_epoch = start
.duration_since(UNIX_EPOCH)
.expect("Time went backwards");
let ms = since_the_epoch.as_secs() as i64 * 1000i64
+ (since_the_epoch.subsec_nanos() as f64 / 1_000_000.0) as i64;
ms
}
fn main() {
task::block_on(async {
let mut time = timestamp1();
println!("start:{:?}", time);
let mut workbook: Workbook = Workbook::new();
let res = async_main().await;
workbook.push_worksheet(res.0);
workbook.push_worksheet(res.1);
workbook.push_worksheet(res.2);
workbook.push_worksheet(res.3);
workbook.save("demo.xlsx").unwrap();
time = timestamp1();
println!("end:{:?}", time);
});
}
async fn async_main() -> (Worksheet, Worksheet, Worksheet, Worksheet) {
let f1 = write_data();
let f2 = write_data();
let f3 = write_data();
let f4 = write_data();
let res: (Worksheet, Worksheet, Worksheet, Worksheet) = futures::join!(f1, f2, f3, f4);
res
}
fn timestamp1() -> i64 {
let start = SystemTime::now();
let since_the_epoch = start
.duration_since(UNIX_EPOCH)
.expect("Time went backwards");
let ms = since_the_epoch.as_secs() as i64 * 1000i64
+ (since_the_epoch.subsec_nanos() as f64 / 1_000_000.0) as i64;
ms
}
async fn write_data() -> Worksheet {
let mut sheet: Worksheet = Worksheet::new();
for j in 1..1048576 {
sheet.write_string(j, 0, "Hello, World!").unwrap();
sheet.write_string(j, 1, "Hello, World!").unwrap();
sheet.write_string(j, 2, "Hello, World!").unwrap();
sheet.write_string(j, 3, "Hello, World!").unwrap();
sheet.write_string(j, 4, "Hello, World!").unwrap();
sheet.write_string(j, 5, "Hello, World!").unwrap();
sheet.write_string(j, 6, "Hello, World!").unwrap();
sheet.write_string(j, 7, "Hello, World!").unwrap();
sheet.write_string(j, 8, "Hello, World!").unwrap();
sheet.write_string(j, 9, "Hello, World!").unwrap();
sheet.write_string(j, 10, "Hello, World!").unwrap();
sheet.write_string(j, 11, "Hello, World!").unwrap();
}
sheet
}
After some testing, I found that the write_string
data is very fast, but it will take a long time to save
.
Is it possible to make save_internal
asynchronous
Is it possible to make
save_internal
asynchronous
That is the plan.
I think the highest value bottleneck for parallelism would be the worksheet writing loop in packager.rs:
https://github.com/jmcnamara/rust_xlsxwriter/blob/main/src/packager.rs#L104-L110
let mut string_table = SharedStringsTable::new();
for (index, worksheet) in workbook.worksheets.iter_mut().enumerate() {
self.write_worksheet_file(worksheet, index + 1, &mut string_table)?;
if worksheet.has_relationships() {
self.write_worksheet_rels_file(worksheet, index + 1)?;
}
}
The tricky(?) part would be to have mutex locked (or some other scheme) updates to the shared string table (which maps strings to an index value using Excel's scheme).
The self.write_worksheet_rels_file()
part could probably move to a non-threaded loop.
@adriandelgado pointed out in #29 that there could be a lot of value in parallelising the zip writing. I don't know if that will be possible using the current zip crate.
I've made a first pass at introducing threading into the back end of rust_xlsxwriter
. The preliminary work in on branch threaded1
. Some notes on this:
thread::scope
instead of thread::spawn
since that makes it easier to work with the lifetimes and "self
escapes the method body here" warnings.rust_xlsxwriter
parts for now and not the zip parts to make the direct effects of the worksheet writing more obvious.On the threaded1
there are 3 test cases:
examples/app_perf_test
: Single worksheet with mixed string and number values.examples/app_perf_test2
: 4 worksheets for string data only.examples/app_perf_test3
: 4 worksheets for number data only.From this I get mixed results:
$ hyperfine target/release/examples/app_perf_test_threaded target/release/examples/app_perf_test_unthreaded --warmup 3
Benchmark 1: target/release/examples/app_perf_test_threaded
Time (mean ± σ): 244.8 ms ± 12.4 ms [User: 221.3 ms, System: 16.9 ms]
Range (min … max): 238.2 ms … 280.0 ms 12 runs
Benchmark 2: target/release/examples/app_perf_test_unthreaded
Time (mean ± σ): 237.3 ms ± 1.1 ms [User: 218.9 ms, System: 16.8 ms]
Range (min … max): 235.6 ms … 239.5 ms 12 runs
Summary
'target/release/examples/app_perf_test_unthreaded' ran
1.03 ± 0.05 times faster than 'target/release/examples/app_perf_test_threaded'
$ hyperfine target/release/examples/app_perf_test2_threaded target/release/examples/app_perf_test2_unthreaded --warmup 3
Benchmark 1: target/release/examples/app_perf_test2_threaded
Time (mean ± σ): 1.261 s ± 0.011 s [User: 1.184 s, System: 0.905 s]
Range (min … max): 1.247 s … 1.283 s 10 runs
Benchmark 2: target/release/examples/app_perf_test2_unthreaded
Time (mean ± σ): 986.1 ms ± 6.9 ms [User: 916.6 ms, System: 66.1 ms]
Range (min … max): 977.7 ms … 997.0 ms 10 runs
Summary
'target/release/examples/app_perf_test2_unthreaded' ran
1.28 ± 0.01 times faster than 'target/release/examples/app_perf_test2_threaded'
$ hyperfine target/release/examples/app_perf_test3_threaded target/release/examples/app_perf_test3_unthreaded --warmup 3
Benchmark 1: target/release/examples/app_perf_test3_threaded
Time (mean ± σ): 778.6 ms ± 20.2 ms [User: 837.8 ms, System: 54.2 ms]
Range (min … max): 766.5 ms … 832.6 ms 10 runs
Benchmark 2: target/release/examples/app_perf_test3_unthreaded
Time (mean ± σ): 889.2 ms ± 4.1 ms [User: 834.7 ms, System: 52.0 ms]
Range (min … max): 884.7 ms … 895.8 ms 10 runs
Summary
'target/release/examples/app_perf_test3_threaded' ran
1.14 ± 0.03 times faster than 'target/release/examples/app_perf_test3_unthreaded'
Some observations from this:
There are some options to remove the mutex lock and contention:
rwlock
and do initial non-locking reads to see if the string exists in the SST and only lock if it doesn't.I'll look into some of these options in the next few days and I'll post some updates as I go.
There are some options to remove the mutex lock and contention:
So for now I've gone with Option1 "Do a separate non-threaded pass of all the worksheet string data to build up the SST table." I've added a second prototype for this on the threaded2
branch.
Overall the results are good:
$ hyperfine target/release/examples/app_perf_test target/release/examples/app_perf_test_unthreaded --warmup 3
Benchmark 1: target/release/examples/app_perf_test
Time (mean ± σ): 238.3 ms ± 2.5 ms [User: 221.8 ms, System: 15.3 ms]
Range (min … max): 234.8 ms … 244.2 ms 12 runs
Benchmark 2: target/release/examples/app_perf_test_unthreaded
Time (mean ± σ): 236.4 ms ± 2.5 ms [User: 220.0 ms, System: 15.0 ms]
Range (min … max): 233.3 ms … 241.2 ms 12 runs
Summary
'target/release/examples/app_perf_test_unthreaded' ran
1.01 ± 0.02 times faster than 'target/release/examples/app_perf_test'
$ hyperfine target/release/examples/app_perf_test2 target/release/examples/app_perf_test2_unthreaded --warmup 3
Benchmark 1: target/release/examples/app_perf_test2
Time (mean ± σ): 919.2 ms ± 14.0 ms [User: 924.3 ms, System: 63.5 ms]
Range (min … max): 901.7 ms … 949.2 ms 10 runs
Benchmark 2: target/release/examples/app_perf_test2_unthreaded
Time (mean ± σ): 980.1 ms ± 11.3 ms [User: 915.7 ms, System: 61.1 ms]
Range (min … max): 964.3 ms … 1000.4 ms 10 runs
Summary
'target/release/examples/app_perf_test2' ran
1.07 ± 0.02 times faster than 'target/release/examples/app_perf_test2_unthreaded'
$ hyperfine target/release/examples/app_perf_test3 target/release/examples/app_perf_test3_unthreaded --warmup 3
Benchmark 1: target/release/examples/app_perf_test3
Time (mean ± σ): 794.1 ms ± 14.5 ms [User: 856.9 ms, System: 50.8 ms]
Range (min … max): 781.7 ms … 832.8 ms 10 runs
Benchmark 2: target/release/examples/app_perf_test3_unthreaded
Time (mean ± σ): 887.7 ms ± 5.7 ms [User: 837.8 ms, System: 46.9 ms]
Range (min … max): 876.2 ms … 898.0 ms 10 runs
Summary
'target/release/examples/app_perf_test3' ran
1.12 ± 0.02 times faster than 'target/release/examples/app_perf_test3_unthreaded'
Summary:
Not amazing but I'll take a 10% increase for the amount of work involved. If anyone could try the threaded2
branch against real code I'd be interested to see the results.
I'll move on to see what can be done with the zip writer parts.
Wow, that's great!
I'm going to merge the second option threaded2
onto main. I think it is the best I can do for now. There are still potential gains to be had from parallelizing the zipping but after an initial look I'm going to leave that to another time/person.
I've pushed these changes to crates.io in v0.44.0. It is the best I can do for now. Hopefully it will inspire some other analysis/contributions.
Closing.
Feature Request
First of all, I would like to thank the author for providing a very useful library. Is it possible to speed up the export if using multithreading