Closed RoloEdits closed 8 months ago
Got some pointers from a maintainer of excelize
, I need to update the data. I'll try to get to it soon as I can.
calamine vs openpyxl (read_only mode), python3.11 on my PC:
Benchmark 1: calamine
Time (mean ± σ): 21.299 s ± 0.093 s [User: 20.361 s, System: 0.931 s]
Range (min … max): 21.193 s … 21.512 s 10 runs
Benchmark 1: openpyxl
Time (mean ± σ): 134.424 s ± 0.582 s [User: 133.749 s, System: 0.654 s]
Range (min … max): 133.057 s … 135.192 s 10 runs
Code:
I wanted to add umya-spreadsheet, but it didn't seem to have any way to directly iterate over the rows?
I didn't find this too. With this code, application allocate over 10 GB memory and I killed it.
let path = std::path::Path::new("NYC_311_SR_2010-2020-sample-1M.xlsx");
let book = umya_spreadsheet::reader::xlsx::read(path).unwrap();
let sheet = book.get_sheet_by_name("NYC_311_SR_2010-2020-sample-1M").unwrap();
let _ = sheet.get_collection_to_hashmap();
// OR
let path = std::path::Path::new("NYC_311_SR_2010-2020-sample-1M.xlsx");
let book = umya_spreadsheet::reader::xlsx::lazy_read(path).unwrap();
let _ = book.get_lazy_read_sheet_cells(&0).unwrap();
What version of python did you use?
python3.11 138.470 s
python3.10 158.893 s
@dimastbk Python 3.11.5
. What kind of hardware are you using?
Thanks. I just surprised so big different between python3.10 and 3.11. Intel® Core™ i7-9700, KDE Neon 5.27
I'm also interested in how much slower mine is compared to yours. 100 seconds. I'm not even sure what could account for that much difference.
Thanks! Very informative
Went through and benchmarked some other libraries to see where
calamine
stood compared to other ecosystems. Decided to add it to the docs. As well as, after seeing the results, file an issue forexcelize
.I wanted to add
umya-spreadsheet
, but it didn't seem to have any way to directly iterate over the rows? At least I couldn't tell from the wording in the docs nor the function signitures. If you manage to figure out a way to do that, and want another rust comparison, I don't mind adding it.Git history is a bit messy with fixes, squashing might be best.