xuri / xgen

XSD (XML Schema Definition) parser and Go/C/Java/Rust/TypeScript code generator
BSD 3-Clause "New" or "Revised" License
335 stars 78 forks source link

opt.Element stack can get corrupted, leading to false output #5

Open senekor opened 4 years ago

senekor commented 4 years ago

Description I stumbled upon this bug because some simple types were missing from my output. The opt.Element stack sometimes contains elements which have already been processed by the parser. This leads to at least two problems when a simple type is being processed after this happened at least once. Assume there is a stand alone simple type with some restrictions, let's say maxLength. In xmlMaxLength.go line 17, the EndMaxLength handler falsely thinks that the simple type being processed is part of an element, pops the simple type off the stack and overwrites the type of the element on the stack. The first problem caused by this is that the simple type will not appear in the output file any more. It is simply omitted. The second problem is that the type of the element may be overwritten with a wrong type.

Steps to reproduce the issue:

  1. As an example, use this Swiss eGov xsd file: http://www.ech.ch/xmlns/eCH-0058/5/eCH-0058-5-0.xsd
  2. Run it through xgen and notice that among several others, the simple type actionType (line 59 in xsd) is missing from the output. (I used xgen to output a Go file, but the language shouldn't matter.)
  3. You may insert log statements in the parser logging the opt.Element stack to see what it looks like. You will notice that both elements "metaDataName" and "metaDataValue" remain in the stack after they have been processed by the parser.

Root of the problem: Let's focus on where an element get's removed from the stack first. In my case, the elements did not contain any additional complex types, so we can ignore xmlComplexType.go. There is only one other place where elements are removed from the stack, namely in xmlElement.go line 74. This line is never reached if the element is part of a complex type. (The if condition one line earlier will fail because opt.ComplexType.Len() is greater than zero.) In line 51, elements are added to the stack if they have no "type" attribute. My conclusion is that any element which is part of a complex type, does not contain complex types itself and has no "type" attribute is added but never removed from the stack. This corrupts the state of the opt.Element stack and causes the described problems.

Proposed solution: I am no expert in xsd, so I cannot say what a perfect handling of the opt.Element stack would look like. I certainly asked myself why not every element is added and removed from the stack. I was able to fix the bug for my own case, but I believe what I did could potentially lead to other bugs. I added an else-if condition to the EndElement handler which removes an element from the stack if there are exactly one element and one complex type on their respective stacks:

func (opt *Options) EndElement(ele xml.EndElement, protoTree []interface{}) (err error) {
    if opt.Element.Len() > 0 && opt.ComplexType.Len() == 0 {
        opt.ProtoTree = append(opt.ProtoTree, opt.Element.Pop())
    } else if opt.Element.Len() == 1 && opt.ComplexType.Len() == 1 {
        opt.Element.Pop()
    }
    return
}

Here's where I believe this could cause other problems: Assume that in part of an xsd file there is an element in a complex type in an element, something like this:

<xs:element name="first">
    <xs:complexType>
        <xs:element name="second">
        </xs:element>
    </xs:complexType>
</xs:element>

Also assume the first element is added to the stack, while the second is not. When the end of the second element is being parsed, the else-if condition I added will be true and the first element is removed from the stack, even though it is not done being processed yet.

Since I was able to fix the bug for my specific case and I'm no expert on xsd files, it doesn't make sense for me to work on a general solution. That is all the information I gathered, I hope it is useful.