pabigot / pyxb

Python XML Schema Bindings
Apache License 2.0
130 stars 74 forks source link

_SetValidationConfig causes seemingly unrelated intermittent failures #76

Open evanunderscore opened 7 years ago

evanunderscore commented 7 years ago

Background: I have an element which can contain multiple other elements, repeated and in any order, and I need the ordering of these elements to be maintained. I saw a comment from you elsewhere saying that there was an example of how to enable this in pyxb.bundles.common.xhtml1. I adapted this for my own bindings which solved the ordering problem, but also produced intermittent failures when generating unrelated documents.

The failure seems to have something to do with the hash seed which I believe affects things like the iteration order for dicts. The intermittent failure occurs by default in Python 3 but only occurs with the -R interpreter option in Python 2. Fixing the hash seed via the PYTHONHASHSEED environment variable produces consistent results.

Sample schema (foo.xsd):

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:foo="foo" targetNamespace="foo">
    <xs:element name="foo">
        <xs:complexType>
            <xs:sequence>
                <xs:choice maxOccurs="unbounded">
                    <xs:element name="bar" type="foo:bar"/>
                    <xs:element name="baz" type="foo:baz"/>
                </xs:choice>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
    <xs:complexType name="bar"/>
    <xs:complexType name="baz"/>
    <xs:element name="minOccurs_1">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="a" type="xs:string"/>
                <xs:element name="b" type="xs:string"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
    <xs:element name="minOccurs_0">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="a" type="xs:string" minOccurs="0"/>
                <xs:element name="b" type="xs:string" minOccurs="0"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

The first element foo can contain any number of bar or baz elements, and their ordering is what I'm attempting to preserve. The other two elements minOccurs_1 and minOccurs_0 are unrelated elements that will sometimes fail to generate documents.

Compile bindings:

pyxbgen foo.xsd

Code to reproduce (foo.py):

"""
Usage::

    pyxbgen foo.xsd
    # For Python 3:
    python3 foo.py
    # For Python 2:
    python2 -R foo.py
"""
setconfig = True
setinvalid = True
optional = False

# ---
# Code adapted slightly from pyxb.bundles.common.xhtml1

import pyxb
import binding

DefaultValidationConfig = pyxb.GlobalValidationConfig.copy()

DefaultValidationConfig._setContentInfluencesGeneration(DefaultValidationConfig.ALWAYS)
DefaultValidationConfig._setOrphanElementInContent(DefaultValidationConfig.RAISE_EXCEPTION)
if setinvalid:
    DefaultValidationConfig._setInvalidElementInContent(DefaultValidationConfig.RAISE_EXCEPTION)

def _setValidationConfig ():
    import inspect
    import sys
    import pyxb.binding.basis

    for (n, v) in inspect.getmembers(binding):
        if inspect.isclass(v) and issubclass(v, pyxb.binding.basis._TypeBinding_mixin):
            v._SetValidationConfig(DefaultValidationConfig)

if setconfig:
    _setValidationConfig()

# ---

# Demonstrate correct ordering is maintained (bar -> baz -> bar)
f = binding.foo()
f.append(binding.bar())
f.append(binding.baz())
f.append(binding.bar())
print(f.toxml())

# Attempt to generate a different element.
# This fails approximately half the time.
element = binding.minOccurs_0 if optional else binding.minOccurs_1
m = element(a='a', b='b')
print(m.toxml())

The variables at the top of the script are intended to be adjusted to demonstrate different results.

When setting the config options as for xhtml, regardless of whether optional is True or False, the example will produce InvalidPreferredElementContentError about half the time.

# setconfig = True
# setinvalid = True
# optional = False

$ PYTHONHASHSEED=0 python foo.py
<?xml version="1.0" ?><ns1:foo xmlns:ns1="foo"><bar/><baz/><bar/></ns1:foo>
<?xml version="1.0" ?><ns1:minOccurs_1 xmlns:ns1="foo"><a>a</a><b>b</b></ns1:minOccurs_1>

$ PYTHONHASHSEED=1 python foo.py
<?xml version="1.0" ?><ns1:foo xmlns:ns1="foo"><bar/><baz/><bar/></ns1:foo>
Traceback (most recent call last):
  File "foo.py", line 52, in <module>
    print(m.toxml())
  File "C:\Python27\lib\site-packages\pyxb\binding\basis.py", line 555, in toxml
    dom = self.toDOM(bds, element_name=element_name)
  File "C:\Python27\lib\site-packages\pyxb\binding\basis.py", line 527, in toDOM
    self._toDOM_csc(bds, element)
  File "C:\Python27\lib\site-packages\pyxb\binding\basis.py", line 2661, in _toDOM_csc
    order = self._validatedChildren()
  File "C:\Python27\lib\site-packages\pyxb\binding\basis.py", line 2188, in _validatedChildren
    return self.__automatonConfiguration.sequencedChildren()
  File "C:\Python27\lib\site-packages\pyxb\binding\content.py", line 635, in sequencedChildren
    raise pyxb.InvalidPreferredElementContentError(self.__instance, cfg, symbols, symbol_set, psym)
pyxb.exceptions_.InvalidPreferredElementContentError: (<binding.CTD_ANON_ object at 0x02DD9A50>, <pyxb.utils.fac.Configuration object at 0x02DD9BD0>, [], {<pyxb.binding.content.ElementDeclaration object at 0x02D9DF90>: [u'a'], <pyxb.binding.content.ElementDeclaration object at 0x02D9DFD0>: [u'b']}, (u'b', <pyxb.binding.content.ElementDeclaration object at 0x02D9DFD0>))

If not using _setInvalidElementInContent, the example works consistently for the non-optional element.

# setconfig = True
# setinvalid = False
# optional = False

$ PYTHONHASHSEED=0 python foo.py
<?xml version="1.0" ?><ns1:foo xmlns:ns1="foo"><bar/><baz/><bar/></ns1:foo>
<?xml version="1.0" ?><ns1:minOccurs_1 xmlns:ns1="foo"><a>a</a><b>b</b></ns1:minOccurs_1>

$ PYTHONHASHSEED=1 python foo.py
<?xml version="1.0" ?><ns1:foo xmlns:ns1="foo"><bar/><baz/><bar/></ns1:foo>
<?xml version="1.0" ?><ns1:minOccurs_1 xmlns:ns1="foo"><a>a</a><b>b</b></ns1:minOccurs_1>

However with the optional element, it produces UnprocessedElementContentError about half the time.

# setconfig = True
# setinvalid = False
# optional = True

$ PYTHONHASHSEED=0 python foo.py
<?xml version="1.0" ?><ns1:foo xmlns:ns1="foo"><bar/><baz/><bar/></ns1:foo>
<?xml version="1.0" ?><ns1:minOccurs_0 xmlns:ns1="foo"><a>a</a><b>b</b></ns1:minOccurs_0>

$ PYTHONHASHSEED=1 python foo.py
<?xml version="1.0" ?><ns1:foo xmlns:ns1="foo"><bar/><baz/><bar/></ns1:foo>
Traceback (most recent call last):
  File "foo.py", line 52, in <module>
    print(m.toxml())
  File "C:\Python27\lib\site-packages\pyxb\binding\basis.py", line 555, in toxml
    dom = self.toDOM(bds, element_name=element_name)
  File "C:\Python27\lib\site-packages\pyxb\binding\basis.py", line 527, in toDOM
    self._toDOM_csc(bds, element)
  File "C:\Python27\lib\site-packages\pyxb\binding\basis.py", line 2661, in _toDOM_csc
    order = self._validatedChildren()
  File "C:\Python27\lib\site-packages\pyxb\binding\basis.py", line 2188, in _validatedChildren
    return self.__automatonConfiguration.sequencedChildren()
  File "C:\Python27\lib\site-packages\pyxb\binding\content.py", line 640, in sequencedChildren
    raise pyxb.UnprocessedElementContentError(self.__instance, cfg, symbols, symbol_set)
pyxb.exceptions_.UnprocessedElementContentError: (<binding.CTD_ANON_2 object at 0x02DA9A50>, <pyxb.utils.fac.Configuration object at 0x02DA9BD0>, [<pyxb.binding.basis.ElementContent object at 0x02DA9D30>], {<pyxb.binding.content.ElementDeclaration object at 0x02D7F070>: [u'a']})

Any insight you have would be greatly appreciated.

pabigot commented 7 years ago

Nothing is obvious to me. I'll take a look when I'm next working PyXB.

evanunderscore commented 7 years ago

The problem appears to be based on which order things are in orderedContent(). For example, if in my example script you replace m = element(a='a', b='b') with m = element(); m.b = 'b'; m.a = 'a', the intermittently failing examples will fail every time.

I've also noticed that when using Python 3.6, all intermittent failures disappear. This almost guarantees the problem is related to dict iteration order (since dicts in Python 3.6 retain their ordering). What's interesting is that both m = element(a='a', b='b') and m = element(b='b', a='a') give you the same orderedContent() with a before b, which seems to mean something is being done to put the elements in the correct order and for Python < 3.6 it's being lost somewhere in the internals.

I've now come to realize that what I'm really asking for is probably a lot more complicated than I previously thought, so even if this problem were fixed I'd need to rethink my approach.