tfussell / xlnt

:bar_chart: Cross-platform user-friendly xlsx library for C++11+
Other
1.5k stars 423 forks source link

xl/workbook.xml:2:461: error: unexpected attribute 'codeName' #83

Closed adam-nielsen closed 8 years ago

adam-nielsen commented 8 years ago

Hi again,

I'm getting a few XML parser errors but having trouble producing a sample spreadsheet (seems once I reduce the amount of data in the sheet the problems go away.) I will keep working on it, but in the meantime, I don't suppose this is enough to figure out what the problem may be?

terminate called after throwing an instance of 'xml::parsing'
what(): xl/workbook.xml:2:461: error: unexpected attribute 'codeName'

workbook.xml:2:461 is in the workbookPr element:

<fileVersion appName="xl" lastEdited="6" lowestEdited="4" rupBuild="14420"/>                                                                                   
<workbookPr codeName="ThisWorkbook" defaultThemeVersion="124226"/>                                                                                             
<mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"> 

Thanks!

adam-nielsen commented 8 years ago

Also getting this error, if I remove the codeName attribute above:

terminate called after throwing an instance of 'xml::parsing'
what(): xl/sharedStrings.xml:2:4036: error: end element expected

I can't see what the problem is in this case, sharedStrings.xml looks like this (approximately, can't post the actual data):

<si><t>Alpha Bravo Charlie Delta</t></si>

Column 4036 is equivalent to the C in Charlie so there shouldn't be an end element there, and the end element is correct and slightly later on (at 2:4050 actually.)

Is it possible the line is too long, and it's getting truncated when read into a buffer that's too short? The actual line in the shared strings file is 7860 characters long, and there's a third line that's 16099 characters long. It looks like MS Excel only puts line breaks in the file when they exist as part of the shared string content itself.

This could explain why this particular error goes away when I reduce the data in the file - once it goes below xlnt's maximum line length then it works again?

tfussell commented 8 years ago

The first problem should be fixed. Basically any unexpected attributes in the XML cause the parser to throw an exception. I just need to take the time one of these days to use or ignore every attribute in the ECMA-376 standard, but it's a big job. For now, users such as yourself reporting these parsing problems will incrementally improve what the library is able to handle.

adam-nielsen commented 8 years ago

Great, many thanks! That does sound like it would be a big job...

tfussell commented 8 years ago

That's some good detective work regarding sharedStrings.xml. All of my test files have been very small so this hasn't come up before. I rewrote the parsing and serialization recently to use streams instead of storing the full XML in memory so I might very well have a problem with the stream buffer. Let me see if I can reproduce this problem by creating a workbook with many strings.

adam-nielsen commented 8 years ago

Here's another data point from a different file:

terminate called after throwing an instance of 'xml::parsing'
what(): worksheets/sheet1.xml:2:24519: error: end element expected

This time it breaks at column 24519 so I'm less confident about the line length now. In this file, the affected area in sheet1.xml looks like this:

<c r="A156" s="1"><v>1234567890</v></c>

2:24519 is on the 5 in 1234567890.

Thanks for looking into this.

tfussell commented 8 years ago

You were on the right track. It turns out that there's a quirk with the XML parser that causes it to parse character data as two separate character events if the end of the read stream buffer falls in the region between the tags. "a||bc" where || represents a read boundary is parsed as: open tag, character data "a", character data "bc", end tag. (The XML parser's internal buffer is 4096 characters by the way). This is manifested as an invalid XML exception when really a second character event should be handled. I'll need to find and convert other character events to handle this possibility, but it's fixed for worksheet and shared strings in the commit I will make in a few minutes. This was a tricky one...

adam-nielsen commented 8 years ago

Ah excellent, many thanks! I wonder if you can set the buffer to some low number like 1 or 2 chars (instead of 4096) when running the tests? Just thinking that should make it much easier to pick up any related issues. I'll try out the commit and let you know how I go!

tfussell commented 8 years ago

That's a good point. It's a third-party XML library so I don't have direct control over it. I'll see if it allows the buffer size to be adjusted somehow.

adam-nielsen commented 8 years ago

It works! No errors at all now, thanks again for such a quick fix! Much appreciated.