jmcnamara / libxlsxwriter

A C library for creating Excel XLSX files.
https://libxlsxwriter.github.io
Other
1.48k stars 330 forks source link

Issue when using constant memory with merge range #435

Closed tritueviet closed 4 months ago

tritueviet commented 5 months ago

I am using libxlsxwriter to do export excel has a lot of merge cell range with constant memory support, but it not working because it not supported. Here is some code that demonstrates the problem:

`#include "xlsxwriter.h"

int main() { lxw_workbook_options options = {.constant_memory = LXW_TRUE, .tmpdir = NULL, .use_zip64 = LXW_FALSE, .output_buffer = NULL, .output_buffer_size = NULL}; lxw_workbook workbook = workbook_new_opt("merge_range.xlsx", &options); lxw_worksheet worksheet = workbook_add_worksheet(workbook, NULL); lxw_format *merge_format = workbook_add_format(workbook);

/* Configure a format for the merged range. */
format_set_align(merge_format, LXW_ALIGN_CENTER);
format_set_align(merge_format, LXW_ALIGN_VERTICAL_CENTER);
format_set_bold(merge_format);
format_set_bg_color(merge_format, LXW_COLOR_YELLOW);
format_set_border(merge_format, LXW_BORDER_THIN);

/* Increase the cell size of the merged cells to highlight the formatting. */
worksheet_set_column(worksheet, 1, 3, 12, NULL);
worksheet_set_row(worksheet, 3, 30, NULL);
worksheet_set_row(worksheet, 6, 30, NULL);
worksheet_set_row(worksheet, 7, 30, NULL);

/* Merge 3 cells. */
worksheet_merge_range(worksheet, 3, 1, 3, 3, "Merged Range", merge_format);

/* Merge 3 cells over two rows. */
worksheet_merge_range(worksheet, 6, 1, 7, 3, "Merged Range", merge_format);

workbook_close(workbook);

return 0;

} `

Can you support this function in constant memory?

tritueviet commented 4 months ago

any update?

jmcnamara commented 4 months ago

I don't plan to implement this feature. Constant memory mode writes data row by row for efficiency and merge_range() needs to write data on (usually) several rows so they aren't compatible.

I'll explain why in case anyone want to try fix it in their own version but I won't upstream it.

Consider a simple worksheet with a merged area like this:

screenshot

This is stored in an Excel xlsx file with the following xml:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet 
    xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
  <dimension ref="B2:D4"/>
  <sheetViews>
    <sheetView tabSelected="1" workbookViewId="0"/>
  </sheetViews>
  <sheetFormatPr defaultRowHeight="15"/>
  <sheetData>
    <row r="2" spans="2:4">
      <c r="B2" s="1">
        <v>123456</v>
      </c>
      <c r="C2" s="1"/>
      <c r="D2" s="1"/>
    </row>
    <row r="3" spans="2:4">
      <c r="B3" s="1"/>
      <c r="C3" s="1"/>
      <c r="D3" s="1"/>
    </row>
    <row r="4" spans="2:4">
      <c r="B4" s="1"/>
      <c r="C4" s="1"/>
      <c r="D4" s="1"/>
    </row>
  </sheetData>
  <mergeCells count="1">
    <mergeCell ref="B2:D4"/>
  </mergeCells>
  <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
</worksheet>

The main thing to understand from this is that the merged range is stored in 2 places/ways:

  1. In the <mergeCells> element.
  2. As a collection of cell<c> elements which are mainly blank and which have the same cell style property s="1" as the first cell in the merged range which contains the string/number for the merged cells.

When writing a file in constant_memory mode it is possible to write the elements in the first but the elements in the second require the row number/cursor to advance which means that data cannot be written in the previous rows.

So that breaks merged ranges or else breaks writing any other data in a section of the worksheet with a merged range.

It would be possible to workaround this by tracking the merged ranges and then as the row/cursor advances to write out the formatted cells. However, I don't intend to add this to the library since it is reasonable amount of work to implement in a non error prone way. Also, and this is as important as the technical/effort reason, the constant memory mode is mainly intended for cases where the user wants to dump a large amount of data in a memory efficient way and in that mode they need to compromise on features such as merged ranges and tables.

So I am going to close this as won't fix.