misterspeedy / FsExcel

An F# Excel spreadsheet generator
MIT License
138 stars 17 forks source link

Pushstream support #8

Open misterspeedy opened 2 years ago

misterspeedy commented 2 years ago

Query from Natalie Perret:

is there a support for PushStream or Seq? (i.e. low memory footprint)

https://twitter.com/natalie_perret/status/1500264966024224768

natalie-o-perret commented 2 years ago

About the pushtream thinking about that: https://gist.github.com/mrange/fbefd946dba6725a0b727b7d3fd81d6f

Actually could be about any kind of "streamable" sequences:

natalie-o-perret commented 2 years ago

Considering what the function AsWorkBook does: https://github.com/misterspeedy/FsExcel/blob/main/src/FsExcel/FsExcel.fs#L143 i.e. the big bulk is about iterating for ... in ... do over a list of items and match ... with. So not sure that if there is a huge benefit to iterate over a F# list, specifically.

That being said, ain't too sure 'bout the memory footprint of XLWorkbook when there are thousands of lines or sizable number of tabs.

I'm trying to see if we could use your lib at work, and if it can fit our (business) needs.

misterspeedy commented 2 years ago

Thanks for the feedback @natalie-o-perret! The DSL obviously uses an F# list for fluency at the worksheet definition stage (Fable.React style) so I guess iterating over it seemed natural. Maybe this could be done in a more sequency way, but the fundamental question is, can ClosedXML (or some alternative) accommodate this so the workbook is produced incrementally. My guess is 'no' as certain metadata, eg. the timestamp, need to be enclosed in the "stream" before it is completed. But I haven't dug into it.

Anyway, please could you let me know what volumes and execution environment you are targeting, and maybe I can do some benchmarking? An actual example of the dataset in question would be ideal though I realize that might be proprietary.

While I await that I'll do some basic volume testing and see what micro-optimizations I can make.

Thanks again for the input!

misterspeedy commented 2 years ago

A very crude volume test suggest the current code is fine for rendering around a million cells, but struggles with 10 million. In the latter case it takes a few minutes but still completes fine. Next step will be to add some proper BenchmarkDotNet tests.

natalie-o-perret commented 2 years ago

A very crude volume test suggest the current code is fine for rendering around a million cells, but struggles with 10 million. In the latter case it takes a few minutes but still completes fine. Next step will be to add some proper BenchmarkDotNet tests.

YMMV, depends on the hardware (CPU + Memory).

The OOM issues I've bumped into (with a different .NET Excel library (infamous NPOI)) happend with a few hundreds of thousands of lines in a dockerized app (so no crazy amount of resources).

Btw, I 💯 acknowledge that the way the business (accountants) is handling Excel stuff on our end makes very little sense to me (i.e. why bother with huge Excel files, but there are times there is just so much you can do to convince people who already have their habits don't really wanna die anytime soon, i.e. other priorities).

We will have to run some (integration) tests on next quarter with your lib. I will have a resource rewriting the service we've taken over beginning of this year (also try to re-re-re-convince again the bizfolks to adjust the way they are handling their stuff).

Anyhoo-way, will keep you posted, probably mid-next month, we have others things in the pipeline atm.

natalie-o-perret commented 2 years ago

es fine. Next step will be to add some proper BenchmarkDotNet tests.

Oh and btw, thanks for your lib and thank you for being careful about the issues have been filed so far. Don't wanna pass for someone who is ungrateful for what you have been doing 🙇‍♀️

misterspeedy commented 2 years ago

@natalie-o-perret Thanks for the update and context. Yours sounds like a really great use case, provided at least once constraint is that you are building non-trivial structures which are fiddly to code in mutable style. If the structure is really simple it might be faster (and dare-I-say-it simpler) to use the underlying library directly. Hope FsExcel is a good fit - if it is I'll do my best to make it a great experience for you and your colleagues.

natalie-o-perret commented 1 year ago

Relates to my comment here ~~

https://github.com/misterspeedy/FsExcel/issues/34#issuecomment-1445365114