mjul / docjure

Read and write Office documents from Clojure
MIT License
622 stars 129 forks source link

How can I change sheet to stream and save it into workbook #46

Open mut0u opened 8 years ago

mut0u commented 8 years ago

I have a big data to save.

So I have to loop to call add-rows! to sheet and then save the sheet to workbook

I guess the data is too big and the clojure throw OutOfMemoryError GC overhead limit exceeded Exception.

So I have to change the sheet into stream and save the workbook with outputstream .

What should I do? Thanks.

bagl commented 8 years ago

I would also appreciate a tip how to handle big data.

mjul commented 8 years ago

Thanks for the feedback. It sounds like an interesting use cases.

The current stream story of Docjure is just to perform stream IO: the document is still built up in-memory.

I have not run into the memory-problem myself so I can offer no better advice than throwing more memory at it, or rolling up the sleeves and adding streaming to Docjure.

The underlying Apache POI library supports a limited streaming model for big datasets so please have a look at that and see if you can find a way to let Docjure leverage it.

You will find the documentation here: POI documentation - in particular the streaming API, SXSSF

mut0u commented 8 years ago

I have a big data from database. I use the SQL limit offset and loop to handle a litte part of data each time. But I have no idea that the data reference maybe exist all the time and the gc will never destroy the expired data. So I use 30G memory for create 20M xlsx file.

I am trying to rewrite my code to find out the way to solve my problem.

At the very beginning, I guess the create sheet will use lots of memory, so I want it can run like stream.

Finally, I figure out that it is due to my code .