jmcnamara / libxlsxwriter

A C library for creating Excel XLSX files.
https://libxlsxwriter.github.io
Other
1.49k stars 332 forks source link

remove a worksheet that was previously added via workbook_add_worksheet() (or alternatively, dupe a worksheet) #370

Closed liquidaty closed 2 years ago

liquidaty commented 2 years ago

Hi,

I am processing a stream of data that writes data to a worksheet. Under certain conditions, depending on the data stream content, it may be necessary to create a second worksheet which is a duplicate of the first worksheet, up to that point, but will differ at that point (and possibly later as well). Furthermore, I would like for the input data as well as the xlsx writing be processed as a stream in chunks and for the writer to use the constant memory option.

In order to do this, the two approaches that I'm looking into are:

a. create 2 worksheets from the start, but remove the 2nd one, prior to saving the workbook, if it turns out to be unnecessary (which we'll only know after all the input has been processed). Question: is there any way to remove a worksheet (that was previously added by workbook_add_worksheet) before saving the workbook?

b. create one worksheet until the point at which the 2nd one is know to be needed, and at that point create a duplicate of the first. This approach would be ideal since it doesn't use extra resources unless needed, but either would be acceptable solutions. But, the question here is: if I'm writing to worksheet X using streaming i.e. constant memory, is there any way to either dupe that sheet, or go back and iterate through all of its values?


EXAMPLES

For simplicity sake, imagine that the input is JSON rows and cells, and that if any cell value is an array with 2 elements (instead of a scalar), it means I want to create 2 worksheets, one with the first value and one with the second. But, if no such cells are arrays, we will only output a single worksheet.

Example 1

input: [["A1","B1","C1"],["A2","B2","C2"]] output: A workbook with 1 worksheet with values A1, B1, C1 on row 1, and A2, B2, C2 on row 2

Example 2

input: [["A1","B1","C1"],["A2",["B2-worksheet1","B2-worksheet2"],"C2"]] output: A workbook with 2 worksheets:

Does anyone know the answer to the above questions, and if they are not currently possible, any suggestions?

liquidaty commented 2 years ago

Looking at #271 it seems that option (b) is out (or must be manually performed outside of libxlsxwriter. So the remaining question is (a): can I remove a worksheet, previously added with workbook_add_worksheet, before I save the workbook?

jmcnamara commented 2 years ago

In general libxlsxwriter doesn't support any data manipulation. The libxlsxwriter architectural model is similar to a data black box and once the data crosses the API it is no longer intended to be accessible to the user.

It would probably be best to come up with some scheme where the data is cached in the application and written once the decision point is reached.

If there is no feasible way to do that in the application then it may be possible to remove a second worksheet object (option a) but it would be a hacky safety guard removed type of solution. I could try create a small example of how to do that if there is no other alternative.