salvois / LargeXlsx

A .net library to write large Excel files in XLSX format with low memory consumption using streamed write.
Other
204 stars 34 forks source link

RTL worksheet ? #2

Closed ncaridi closed 3 years ago

ncaridi commented 3 years ago

Hi , Thank you for sharing this library. Is it possible to change the worksheet reading order to rtl ? also, another question off topic , I think the files generated with this library are considerably bigger than they would if they'd be generated with OpenXML (?), right now I have 400K~ records at about 10MB , while OpenXML is around 2.5MB.

Thank you.

salvois commented 3 years ago

Hello, sorry for the late reply: did not receive notification from github.

I have not looked into RTL reading order, but will try and let you know. Guess I can count on you for some testing? ;)

For the size issue, it may well be the case! LargeXlsx uses the SharpCompress library with its default settings to zip-compress the workbook data, whereas Office Open XML may use more aggressive compression options by default. 4x is a heck of a difference, tough! Will check this too. Of course, I assume the content is exactly the same in your test. Would it be possible to have a sample workbook?

Thanks, Salvo

salvois commented 3 years ago

Hi @ncaridi , just published release 1.2.0 with support for right-to-left worksheets. Please let me know if it solves your problem.

For the file size problem, from my understanding the library used a stronger compression than Excel by default, and my same example content saved by Excel was a bit larger. I even lowered it a bit in this release to gain a bit of speed. I guess what you may be observing is, if you have a lot of repeating text, the effect of the lack of the global string table, which is by design because it is not friendly with streamed write. An example Excel file may help clarify whether this is the case. Please feel free to open a separate issue for this.

Thanks, Salvo

ncaridi commented 3 years ago

Thank you for taking the time looking into this. I've been very busy with other project and couldn't get back to this. I do have a lot of repeating text, I guess that is the give and take between working with a full in memory DOM / vs streamed write ?

salvois commented 3 years ago

Yes, I considered that deduplicating strings (which are potentially unbounded) in memory may nullify the advantages of a constant, low-memory streamed write, thus all strings are streamed directly as cell contents in my implementation. I recognize it may depend on your use case, though. Please feel free to make your proposals if you come up with an idea! Thanks, Salvo