willemdj / erlsom

XML parser for Erlang
GNU Lesser General Public License v3.0
264 stars 103 forks source link

erlsom:scan doesn't properly handle subtypes when the element in question is within a choice with elements of the same basic type #76

Closed ElectronicRU closed 3 years ago

ElectronicRU commented 3 years ago

Given the following XSD:

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
        <xsd:complexType name="BaseType">
                <xsd:attribute name="id" type="xsd:ID" />
        </xsd:complexType>
        <xsd:complexType name="DerivedType">
                <xsd:complexContent>
                        <xsd:extension base="BaseType">
                                <xsd:attribute name="name" type="xsd:string" />
                        </xsd:extension>
                </xsd:complexContent>
        </xsd:complexType>
        <xsd:complexType name="BagType">
                <xsd:choice>
                        <xsd:element name="column-a" type="BaseType" />
                        <xsd:element name="column-b" type="BaseType" />
                </xsd:choice>
        </xsd:complexType>
        <xsd:element name="bag" type="BagType" />
</xsd:schema>

The following XML:

<?xml version="1.0" encoding="UTF-8"?>
<bag xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <column-b xsi:type="DerivedType" id="blep" name="blop" />
</bag>

fails to parse, even though it ought.

I've tracked the problem down to erlsom_pass2:pass3Alternative function, which attempts to add types to disambiguate between alternatives with the same base type. Unfortunately, it fails to account for derived types that might occur.

I see no easy fix for this. I've come up with 2 solutions so far:

  1. Quick'n'dirty way: modify pass3Alternative to clone the whole hierarchy. Modify subtype checking to search for subtypes also within specified type clones (in this case, they'd be named something like "column-a-BaseType" and "column-a-DerivedType" etc, I guess).
  2. More proper way: make the function produce a "wrapper type" of some sort that would have one element with one alternative, and some sort of flag that tells the scanner to immediately descend into that element. In HRL files, such types would be translated to 1-record fields, ie 2-tuples: a bit finnicky, but very workable.

I'd love to hear your opinions on this rather convoluted corner case.