Open janhec opened 1 week ago
workbook22.xml contains:
I would say that this not a bug in OpenXLSX, but a bug in whatever produced the workbook you are trying to open. In ECMA-376 (the official standard for the Office Open XML formats) part 1, section 12.2 stipulates that the workbook data must be physically located in the /workbook.xml file.
So a workbook file named workbook22.xml is clearly non-standard
Thank you for the info. I admit not reading the standard. Too lazy and/or busy with other things.
There remains a nag on my mind that the standard may not exclude older standards, so we should adhere to /xl/workbook.xml when creating new (and preferably, saving old) workbooks, but still be able to open this case.
Reasons: appName="SpreadsheetLight" in /xl/workbook22.xml and the fact that there is a library for this (https://github.com/ARLM-Keller/SpreadsheetLight, presumably relevant, did not check further). Then (again) both excel and libreoffice will open these files; after an excel save, the (new) standard is applied. From a worksheet the problematic xmlns (x:): xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main" (I could not access this and similar through a browser.)
Looking at modern standard spec (in workbook.xml), I see e.g.: xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="x15 xr xr6 xr10 xr2" xmlns:x15="http://schemas.microsoft.com/office/spreadsheetml/2010/11/main" xmlns:xr="http://schemas.microsoft.com/office/spreadsheetml/2014/revision" xmlns:xr6="http://schemas.microsoft.com/office/spreadsheetml/2016/revision6" xmlns:xr10="http://schemas.microsoft.com/office/spreadsheetml/2016/revision10" xmlns:xr2="http://schemas.microsoft.com/office/spreadsheetml/2015/revision2"> which seems to include a lot of versions/revisions.
I have solved my problem in a rather hacky way by changing XL_Document and XLXmlData (cpp and hpp), took me a few hours. After making workbook.xml (and workbook.xml.rels) variable (namewise) and getting the name from .rels, putting it in m_data: When loading (and finding nonstandard workbook*.xml), I remove the problematic xmlns (x:) and record the distinct names by xml source; when saving, I put them back (hoping for no conflicts of any kind).
I have not investigated whether this is more generally adequate and useful. I doubt you want to incorporate such mods, but if you wish to look at them, let me know.
No offense if you close this.
@janhec : Out of curiosity: can you summarize the exact relation of all XML files and fields that are different with respect to your workbook22.xml?
I am not entirely sure how your _rels/.rels file is looking - is the workbook22.xml referenced like so?
<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="xl/workbook22.xml"/>
The important bit I am interested in here is whether or not the exact type string is reflected here, so that the document format is at least logically consistent and the workbook file name could be looked up by checking the _rels/.rels file for the relationship entry with that type.
The .rels file looks like this:
`<?xml version="1.0" encoding="utf-8"?>
`
workbook22.xml goes like this:
`<?xml version="1.0" encoding="utf-8"?>
Hmm... I am a bit worried about the worksheets r:id="R4b78b06d10804515"
in workbook22.xml - OpenXLSX has a bunch of code that makes assumptions about how the r:id values are of the format "rId###" where ### is strictly numerical.
Could you provide a full example of one of your xlsx files with a workbook22.xml so that I can have a look at whether it could be supported without major changes?
A file that does load without problems in libreoffice would be great :)
Please treat the appended xlsx file as confidential in the sense of not sharing it widely, e.g. through putting it on a site. Keep private if possible. I tried shortening it without losing the workbook22 thing, but failed to produce a readily loadable result, time is up for a bit.
Btw, I had no apparent difficulty with the r:id's. Didn't notice that a format was enforced, anyhow I may not have gone deep enough to notice. But I could load and save it after my mods.
Thanks for looking into this! Jan
PS shouldn't mention it here, perhaps, but I had trouble appending rows with new rows having to be added to the worksheet. This worked fine with the previous (2024-05-04) lib, but not in the 2024-05-24 version.
XLWorksheet::Row(unint32_t) constructor -> getRowNode(XMLNode, uint32_t), rowNumber > result.attribute("r").as_ullong(), return -> XLQuery -> execQuery (SharedStrings) result empty XLRow(const XMLNode&, const XLSharedStrings&). Subsequent .cells() gave me m_rowNode empty, m_firstCol 1, m_lastCol 0. .cells().begin() then crashes when doing a unique_ptr: Exception thrown: read access violation. std::forward<pugi::xml_node & __ptr64>(...) returned nullptr.
I'll still have to replay this without my mods, but it does not obviously seem connected to the mods. Advice welcome.
On Thu, Jul 11, 2024 at 2:29 AM aral-matrix @.***> wrote:
Hmm... I am a bit worried about the worksheets r:id="R4b78b06d10804515" in workbook22.xml - OpenXLSX has a bunch of code that makes assumptions about how the r:id values are of the format "rId###" where ### is strictly numerical.
Could you provide a full example of one of your xlsx files with a workbook22.xml so that I can have a look at whether it could be supported without major changes?
A file that does load without problems in libreoffice would be great :)
— Reply to this email directly, view it on GitHub https://github.com/troldal/OpenXLSX/issues/254#issuecomment-2221764142, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTBHL7JAPMMXXJ4VBACJO3ZLXGX7AVCNFSM6AAAAABKNE2BZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRRG43DIMJUGI . You are receiving this because you were mentioned.Message ID: @.***>
Please treat the appended xlsx file as confidential in the sense of not sharing it widely
I am not sure if I am missing something w.r.t. how github is used, but I do not see an attachment. You can reach me via my codeberg profile https://codeberg.org/lars_uffmann - email is listed there.
Sorry if I appear blind, but I don't see an email address on your codeberg profile. Perhaps I should register? Can you send me an email through the email address intended, so I can reply and you get the attachment? Thanks, Jan
On Thu, Jul 11, 2024 at 11:03 AM aral-matrix @.***> wrote:
Please treat the appended xlsx file as confidential in the sense of not sharing it widely
I am not sure if I am missing something w.r.t. how github is used, but I do not see an attachment. You can reach me via my codeberg profile https://codeberg.org/lars_uffmann - email is listed there.
— Reply to this email directly, view it on GitHub https://github.com/troldal/OpenXLSX/issues/254#issuecomment-2222404318, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTBHL3GRRCM5URO6HIFNB3ZLZC4NAVCNFSM6AAAAABKNE2BZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRSGQYDIMZRHA . You are receiving this because you were mentioned.Message ID: @.***>
my apologies, it indeed doesn't show up to non-registered users - I don't see your email either though :D
However, my project files contain my email, e.g. this one: https://codeberg.org/lars_uffmann/fwmqueue/src/branch/master/demo.cpp (line #5) - I just want to avoid linking it directly here with this github account, I hate microsoft with a passion :D
workbook.xml is hardcoded in xldocument.cpp, but is actually variable. I do not encounter this problem when (re)saving a file in excel (or libreoffice calc), the workbooks in question get outputted by a program that I need in my (customers) workflow. Opening and saving in excel or libreoffice will cure this, but this is not a welcome addition to the (automated) workflow. In these cases, the name will be /xl/workbook22.xml, which is not repeated in [Content_Types].xml (in contrast to workbook.xml in the ordinary case). Some (copilot) prodding got me: Variability in Workbook Paths: You mentioned that sometimes the workbook path is workbook22.xml. The specific name (e.g., workbook22.xml) can vary based on the workbook’s history, edits, and other factors. Excel assigns unique names to different versions of the workbook, especially when you make changes or save multiple copies. Why Different Names?: The variation in workbook names allows Excel to manage different versions, track changes, and handle concurrent editing (e.g., when collaborating with others). Each time you save the workbook, Excel may increment the number or use a different identifier to avoid overwriting existing files. The last line seems a dumb addition, but otherwise it sounds appropriate to me, even if I did not find a trigger for this behavior. The initial bug is that workbook.xml will (obviously) not be found in m_data so an exception is thrown. I tried feeding m_data with data (including xmlid) from _rels.rels, which will make the open succeed apparently, but then the sheets will not be found (older version of openxlsx). Another gotcha in this case is that shared strings has x:si instead of si which gets flagged in the may 25 revision. So I am not confident to find a solution myself and hope for any comments.