tafia / calamine

A pure Rust Excel/OpenDocument SpreadSheets file reader: rust on metal sheets
MIT License
1.6k stars 155 forks source link

Feat: check for memory errors on reading xlsx files and proper formatting of csv sample code #432

Closed SgtPepper47 closed 1 month ago

SgtPepper47 commented 1 month ago

Updates to the Excel range creation to check the machine memory against the memory required for the range. Updates to the sample Excel to csv writer to identify the memory error and to write out a properly formatted csv file with commas instead of semicolons. Resolves #433

tafia commented 1 month ago

Could you add a test showing how this can fail?

SgtPepper47 commented 1 month ago

tafia, Here is an example of a file with a sheet that fails to load into memory. Essentially, it will depend on your machine, but an Excel sheet has a finite range of about 1 million rows to 16k columns. I just run the example script to convert 'Sheet1' to a csv file. Large_Sheet.xlsx Thanks.

tafia commented 1 month ago

tests are failing.

dimastbk commented 1 month ago

May be we should check cells.len() == isize::MAX in loop, when add cell to cells? And this should be backported to other formats too.

SgtPepper47 commented 1 month ago

First, I should have opened this by saying I think calamine is a great piece of work, and I'm impressed with its performance. I see in the build that I mistakenly missed a closing parentheses. I can do another pull request to fix that if you like. To dimastbk's comment. I think it would be cleaner to check in the cells loop to see when it has exceeded the limit for the machine. However, I think cell.len() would return a usize where it wouldn't be able to check against isize. Of course, all of this is just checking for an error. I haven't come up with a solution to be able to load the Excel sheet without an error.

dimastbk commented 1 month ago

@SgtPepper47, sorry, cells.len() == usize::MAX

tafia commented 1 month ago

I think this PR makes things worse as it allocates a potentially very large Vec for nothing (this is not the right Vec to reserve capacity).

tafia commented 1 month ago

Closing it for now, feel free to reopen and come up with an actual improvement. Thanks