jmcnamara / rust_xlsxwriter

A Rust library for creating Excel XLSX files.
https://crates.io/crates/rust_xlsxwriter
Apache License 2.0
348 stars 25 forks source link

question: Enabling large file option #112

Closed stanleysie closed 1 month ago

stanleysie commented 2 months ago

Question

Hi @jmcnamara , I'm encountering the following error when trying to save a large file

IoError(Custom { kind: Other, error: "Large file option has not been set" })

Upon browsing through the codebase, I saw that when saving the workbook using workbook.save, we're calling the save_internal function https://github.com/jmcnamara/rust_xlsxwriter/blob/34e54d5107adeca66bc6377ee9293b17edcaab5e/src/workbook.rs#L976 which accepts a writer argument. In this case, the value for the writer argument passed is a new File object created based on the passed save path.

Then inside save_interval function, we're creating a packager using the passed writer argument, where it sets the large_file in zip_options to be false by default. https://github.com/jmcnamara/rust_xlsxwriter/blob/34e54d5107adeca66bc6377ee9293b17edcaab5e/src/packager.rs#L92

My question, is it possible to set the large_file option during the creation of the workbook or when saving the workbook? If not, how can we set the large_file option to true?

Please let me know if there's any misunderstanding from my statements above. Thank you!

jmcnamara commented 2 months ago

Thanks for the question.

As you spotted I have put the internal code in place to pass in a large_file option via the API. This would be similar to this API in the Python version: https://xlsxwriter.readthedocs.io/workbook.html#workbook-use-zip64

However, in my testing I found that enabling this option created a file that Excel couldn't read. This was a zip.rs issue and since that crate was transitioning to a different owner I didn't raise it as an issue at the time.

Maybe you can create a forked version of rust_xlsxwriter and add it to your Cargo.toml and try enabling it.

Also, you should ensure you are on the latest version of rust_xlsxwriter.

stanleysie commented 2 months ago

Thank you for the clarification! I'll fork and try enabling it from my end!

jmcnamara commented 1 month ago

@stanleysie I will try to look at this in the next few days.

Did you find anything in your testing?

jmcnamara commented 1 month ago

I tested with the current zip.rs crate and there doesn't seem to be any issue between Excel and the zip64 that it produces.

So I've added a Workbook::use_zip_large_file() method to main. It can be used as follows:

    let mut workbook = Workbook::new();

    workbook.use_zip_large_file(true);

Give it a try when you get a chance.

stanleysie commented 1 month ago

Hi @jmcnamara , I've tried enabling it by default by setting it to true on the following line, but I'm still encountering the similar error (Excel can't read the file properly), but this happens only when the overall size of the worksheet exceeds 4GB

Thinking that maybe there's something that I'm missing from my forked version. But seeing your changes in here, it seems like that is really the only thing to change. I'll try to use the updated code from main and get back to you once I have tested it

Thanks for the update!

jmcnamara commented 1 month ago

@stanleysie Thanks for the information.

I only tried it with a small file (much less than 4GB). However, that had also failed the last time I tested so that is an improvement.

If possible could you also try with the zlib feature enabled to see if that make a difference:

cargo add --git https://github.com/jmcnamara/rust_xlsxwriter.git rust_xlsxwriter -F zlib
jmcnamara commented 1 month ago

I was able to reproduce this myself and I can confirm that it doesn't generate the error with the zlib feature enabled. However, Excel gives a warning when reading the output file.

So it looks like it is a zip.rs issue (or issues). I'll create a test case and raise an issue with the zip.rs/zip2.rs project.

jmcnamara commented 1 month ago

It looks like the issue is fixed in v2.2.0 of zip.rs. I've updated the rust_xlsxwriter dependency version and that allows the creation of a 4GB file without a warning.

I've force pushed the change to main. Try it when you get a chance.

jmcnamara commented 1 month ago

@stanleysie I will probably publish this change in a new release on Friday. So let me know if you encounter any issues before that.

jmcnamara commented 1 month ago

This is now upstream in v0.79.0. Thanks for the report and best of luck with Pyaccelsx.

stanleysie commented 1 month ago

Thank you @jmcnamara , sorry for the slow reply. Thank you for this update! I'll be working on it this week, and will let you know how it goes 🙂

stanleysie commented 1 month ago

Hi @jmcnamara, I was able to test it to write large Excel file (533 MB) with Pyaccelsx. This is around 159.6 million cells in total, and the overall sheet size was around 5.3GB upon extraction. I still get a warning, with the following message

We found a problem with some content in 'Filename.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.

but seems like it's harmless. I clicked Yes and let Excel do its magic. Took quite some time to finish and saw this pop up when the file was finally opened. image

Upon checking the content of the file, seems like there's nothing concerning and everything seems to be exported properly (including the formatting). I would say this is a similar behavior to how it is in XlsxWriter when use_zip64 is enabled. But regardless, this is definitely a better result than before! 💯

Once again, thanks for this update! 💯

jmcnamara commented 1 month ago

@stanleysie Thanks for the feedback. I think that warning is mainly harmless. I saw it some times but not always in my testing and, as you say, the file is fine. There is a similar warning every time from the Python version of the library but never from the C or Perl versions.