Closed adam-nielsen closed 8 years ago
Also getting this error, if I remove the codeName
attribute above:
terminate called after throwing an instance of 'xml::parsing'
what(): xl/sharedStrings.xml:2:4036: error: end element expected
I can't see what the problem is in this case, sharedStrings.xml
looks like this (approximately, can't post the actual data):
<si><t>Alpha Bravo Charlie Delta</t></si>
Column 4036 is equivalent to the C
in Charlie
so there shouldn't be an end element there, and the end element is correct and slightly later on (at 2:4050
actually.)
Is it possible the line is too long, and it's getting truncated when read into a buffer that's too short? The actual line in the shared strings file is 7860 characters long, and there's a third line that's 16099 characters long. It looks like MS Excel only puts line breaks in the file when they exist as part of the shared string content itself.
This could explain why this particular error goes away when I reduce the data in the file - once it goes below xlnt's maximum line length then it works again?
The first problem should be fixed. Basically any unexpected attributes in the XML cause the parser to throw an exception. I just need to take the time one of these days to use or ignore every attribute in the ECMA-376 standard, but it's a big job. For now, users such as yourself reporting these parsing problems will incrementally improve what the library is able to handle.
Great, many thanks! That does sound like it would be a big job...
That's some good detective work regarding sharedStrings.xml. All of my test files have been very small so this hasn't come up before. I rewrote the parsing and serialization recently to use streams instead of storing the full XML in memory so I might very well have a problem with the stream buffer. Let me see if I can reproduce this problem by creating a workbook with many strings.
Here's another data point from a different file:
terminate called after throwing an instance of 'xml::parsing'
what(): worksheets/sheet1.xml:2:24519: error: end element expected
This time it breaks at column 24519 so I'm less confident about the line length now. In this file, the affected area in sheet1.xml
looks like this:
<c r="A156" s="1"><v>1234567890</v></c>
2:24519
is on the 5
in 1234567890
.
Thanks for looking into this.
You were on the right track. It turns out that there's a quirk with the XML parser that causes it to parse character data as two separate character events if the end of the read stream buffer falls in the region between the tags. "
Ah excellent, many thanks! I wonder if you can set the buffer to some low number like 1 or 2 chars (instead of 4096) when running the tests? Just thinking that should make it much easier to pick up any related issues. I'll try out the commit and let you know how I go!
That's a good point. It's a third-party XML library so I don't have direct control over it. I'll see if it allows the buffer size to be adjusted somehow.
It works! No errors at all now, thanks again for such a quick fix! Much appreciated.
Hi again,
I'm getting a few XML parser errors but having trouble producing a sample spreadsheet (seems once I reduce the amount of data in the sheet the problems go away.) I will keep working on it, but in the meantime, I don't suppose this is enough to figure out what the problem may be?
workbook.xml:2:461
is in theworkbookPr
element:Thanks!