troldal / OpenXLSX

A C++ library for reading, writing, creating and modifying Microsoft Excel® (.xlsx) files.
BSD 3-Clause "New" or "Revised" License
1.37k stars 331 forks source link

Adding mergeCell() functionality removes all formatting from resulting .xlsx file #288

Open hopkinsa17 opened 1 week ago

hopkinsa17 commented 1 week ago

I'm sorry to keep bringing up issues but I'm using the most recent version of OpenXLSX from the development-aral branch in MS Visual Studio 2022 on Windows 11 x64 and writing a program that produces four .xlsx files; three of those files work just fine when opening in my default .xlsx file reader, Office 365 Excel. The fourth one produces an error on opening and is the only one which has the "mergeCell" functionality. The error is:

Removed Part: /xl/styles.xml part with XML error. (Styles) Load error. Line 6, column 0. Repaired Records: Cell information from /xl/worksheets/sheet1.xml

I am able to build and open the Demo10.xlsx file without issue, so I compared the underlying <xl/worksheets/sheet1.xml> files and found the following difference. The file my program creates contains the following (lines 4-8 post-formatting):

<sheetViews>
       <sheetView tabSelected="1" workbookViewId="0"/>
           <selection activeCell="B1" sqref="B1:K2" />
       </sheetView>
   </sheetViews>

while Demo1.xlsx : sheet1.xml contains (lines 4-6 post-formatting):

<sheetViews>
       <sheetView tabSelected="1" workbookViewId="0"/>
   </sheetViews>

After removing

           <selection activeCell="B1" sqref="B1:K2" />

from my file's sheet1.xml, modifying that section’s header and footer accordingly, rezipping the xml files into a .zip file, and changing the extension to .xlsx, I am able to open it in Excel without issue. Does anyone know how the problematic line is being added to the sheet1.xml file and what I might be able to do differently to avoid adding it in the first place? Thank you!

Edit: fwiw I’ve tried to emulate the format of Demo10.cpp to the best of my ability, and can attach the code that pertains to writing the excel file if it would help. Thank you!

aral-matrix commented 1 week ago

No need to apologize, but I'll have to ask you to have some patience, my vacation ends on 8 November, but until then I don't want to spend too much time on coding :)

From what I can see, it appears as though you are opening an existing Excel file that has this <selection> tag - because OpenXLSX neither uses that tag, nor does the template for an empty XLSX workbook contain this tag for worksheets.

I will have to read up on what it does and - if it turns out to be redundant information, the easiest fix I can think of is to auto-remove the <selection> tag when encountered (upon modifying a worksheet).

Regarding the use of styles, what I wanted to add to yesterday's talk: The styles functionality was so much that I provided a relatively "raw" interface to the underlying styles XML, without being able to implement validation functions for every setting. So it is possible to "break" a workbook by setting invalid combinations of styles (the disabled test section of Demo10 creates some invalid settings, thus I disabled it).

A prime example would be to set a style reference for an index for which no such style exists.

So please, when looking at Demo10, disregard the code that I disabled with the testBasics boolean in line 231, I only used that to manually check how the XML looks, without opening the file in Office.

hopkinsa17 commented 1 week ago

No worries, please enjoy your vacation! I’m in no rush to get this project finished at all

hopkinsa17 commented 3 days ago

Just a progress update, but I’m trying to use a slightly modified version of the OpenXLSX library to access the “sheet1.xml” file and remove the offending child node as a temporary fix. I’ll let you know how it goes, thank you!

aral-matrix commented 3 days ago

orr... you could unzip the resulting file, remove the node from the xml with an editor and re-zip it, I understand you have a working zip solution for that since the previous investigation :)

hopkinsa17 commented 2 days ago

I'm not sure how, but the node I thought was causing the issue to begin with has disappeared from the sheet1.xml file in the unzipped directory. However, I still get an error on opening the .xlsx file in MS Excel saying that there's an issue with /xl/styles.xml: line 6 column 0 and with cell info from /xl/worksheets/sheet1.xml; I looked through both files and could not find any obvious issues that would cause these errors. Worksheet.xlsx is the original, unmodified spreadsheet my program currently creates.

aral-matrix commented 1 day ago

On a first look, I don't see anything wrong with sheet1.xml (so yea, eat **** Microsoft ;) - I'll investigate when I get home.

For the styles neither, especially not in column 6. I wonder if MS Office has a problem with the empty number formats node.

There are only two things I can suggest you try in the meantime: 1) (preferably this step first) reduce your document down to the absolute minimum that produces an error - that makes comparing the before/after LibreOffice save easier 2) create at least one single number format entry to have a non-empty node for that

Beyond that, I can only suspect a problem arising from a mix of the empty workbook template that OpenXLSX has hardcoded in binary and the styles node changes that my XLStyles implementation performs. It appears I did not sufficiently test this - I was mostly testing with an existing workbook where I removed / modified the styles.xml to have / not have certain nodes and where I verified that my code would be able to deal with it and produce valid XML.

Nevertheless, since you said you can open Demo10.xlsx in Office without errors (and that uses the template) I would not expect an issue there.