sissaschool / xmlschema

XML Schema validator and data conversion library for Python
MIT License
406 stars 76 forks source link

Bug: optional elements are not appearing as optional when children of another element #410

Open ethindp opened 1 month ago

ethindp commented 1 month ago

I have this (RelaxNG compact):

muclient = element muclient {
    (include_* & 
     (plugin? & world? & triggers? & aliases? & timers? & macros? & variables? & colours? & keypad? & printing?)
    )
}

As an XSD it renders as (via trang):

  <xs:element name="muclient">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element ref="include"/>
        <xs:choice>
          <xs:element ref="plugin"/>
          <xs:element ref="world"/>
          <xs:element ref="triggers"/>
          <xs:element ref="aliases"/>
          <xs:element ref="timers"/>
          <xs:element ref="macros"/>
          <xs:element ref="variables"/>
          <xs:element ref="colours"/>
          <xs:element ref="keypad"/>
          <xs:element ref="printing"/>
        </xs:choice>
      </xs:choice>
    </xs:complexType>
  </xs:element>

My hope was that it would allow elements to be specified in any order, but either this schema is broken or it doesn't look like that's how this works. (This is my first time ever writing an XML schema, so...) Is this a bug in xmlschema, or a bug in my XSD?

ethindp commented 1 month ago

Okay, so I think this might be a bug with xmlschema after further tinkering. If I change it to:

  <xs:element name="muclient">
    <xs:complexType>
      <xs:sequence>
        <xs:element minOccurs="0" maxOccurs="unbounded" ref="include"/>
        <xs:choice minOccurs="0" maxOccurs="unbounded">
          <xs:element ref="plugin"/>
          <xs:element ref="world"/>
          <xs:element ref="triggers"/>
          <xs:element ref="aliases"/>
          <xs:element ref="timers"/>
          <xs:element ref="macros"/>
          <xs:element ref="variables"/>
          <xs:element ref="colours"/>
          <xs:element ref="keypad"/>
          <xs:element ref="printing"/>
        </xs:choice>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

It still shows as having 1 minimum and maximum occurrence when evaluated:

>>> schema=xmlschema.XMLSchema11("..\\..\\mushclient-schemas\\mushclient.xsd")
>>> list(schema.elements["muclient"].iterchildren())
[Xsd11Element(ref='include', occurs=[0, None]), Xsd11Element(ref='plugin', occurs=[1, 1]), Xsd11Element(ref='world', occurs=[1, 1]), Xsd11Element(ref='triggers', occurs=[1, 1]), Xsd11Element(ref='aliases', occurs=[1, 1]), Xsd11Element(ref='timers', occurs=[1, 1]), Xsd11Element(ref='macros', occurs=[1, 1]), Xsd11Element(ref='variables', occurs=[1, 1]), Xsd11Element(ref='colours', occurs=[1, 1]), Xsd11Element(ref='keypad', occurs=[1, 1]), Xsd11Element(ref='printing', occurs=[1, 1])]

I could be wrong but this definitely looks like a bug to me.

brunato commented 1 month ago

Hi,

XsdElement.iterchildren() is not intended for validating the model (an instance of ModelVisitor is used for this task).

I will check the problem using that sample schema as a base:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="muclient">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element name="include"/>
        <xs:choice>
          <xs:element name="plugin"/>
          <xs:element name="world"/>
          <xs:element name="triggers"/>
          <xs:element name="aliases"/>
          <xs:element name="timers"/>
          <xs:element name="macros"/>
          <xs:element name="variables"/>
          <xs:element name="colours"/>
          <xs:element name="keypad"/>
          <xs:element name="printing"/>
        </xs:choice>
      </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>
ethindp commented 1 month ago

I didn't know that, good to know. I raised this as a bug because I need to be able to tell what elements are optional or not for code generation so I don't go assuming all elements are optional and then someone uses my tool and they run an XML document that is missing a (required) element and it works anyway. I thought that is_emptiable/is_single/is_multiple would do the trick here, but something is being mis-parsed. I have attached the full schema (what I posted was a segment). The .txt extension is just to make GH happy. mushclient.xsd.txt

ethindp commented 1 month ago

Any update on this?

brunato commented 1 month ago

Hi, nothing wrong with validation here, with your schema all these instances are valid:

<?xml version="1.0" encoding="utf-8" ?>
<muclient xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="mushclient.xsd">
</muclient>

<?xml version="1.0" encoding="utf-8" ?>
<muclient xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="mushclient.xsd">
  <include name="foo"/>
</muclient>

<?xml version="1.0" encoding="utf-8" ?>
<muclient xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="mushclient.xsd">
  <include name="foo"/>
  <world port="80" id="1" name="http" site="example.test"/>
</muclient>

<?xml version="1.0" encoding="utf-8" ?>
<muclient xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="mushclient.xsd">
  <world port="80" id="1" name="http" site="example.test"/>
</muclient>

The only general mode for checking if an element is optional or not in a model group is to use a ModelVisitor instance and run on a children elements. This is done for XsdGroup validation thanks to a contribution that added a reordering method based on ModelVisitor. So probably an helper based on ModelVisitor can be built for checking what you ask for, but it's an enhancement, not a bug (not yet ... :smile: ).

As a starting point for doing this is to create a ModelVisitor instance from XSD group and then use its property expected to check which is the possible content that can be pushed, remembering that each choice can involve the following one. This content generation might be very customized on the needs of the user. This is the main reason that in the past drove me to refuse to add this feature to the package.

brunato commented 1 month ago

Anyway a generic method (e.g. XMLSchema.create_xml_data()) could be an option for the future taking count at least of two things:

brunato commented 1 month ago

An example to show how the model visitor can be used for that purpose:

>>> import xmlschema
>>> from xmlschema.validators import ModelVisitor
>>> from random import randint
>>> 
>>> schema = xmlschema.XMLSchema("mushclient.xsd")
>>> model = ModelVisitor(schema.elements['muclient'].type.content)
>>> 
>>> model.element
XsdElement(ref='include', occurs=[0, None])
>>>
>>> model.expected
[]
>>> while model.element is not None:
...     expected = model.expected
...     if not expected:
...         children.append(model.element)
...     else:
...         k = randint(1, len(expected)) - 1
...         children.append(expected[k])
...     if len(children) > 10:
...         break
...     assert not list(model.advance(True))
... 
>>> children
[XsdElement(ref='include', occurs=[0, None]), XsdElement(ref='colours', occurs=[1, 1]), XsdElement(ref='macros', occurs=[1, 1]), XsdElement(ref='colours', occurs=[1, 1]), XsdElement(ref='printing', occurs=[1, 1]), XsdElement(ref='colours', occurs=[1, 1]), XsdElement(ref='macros', occurs=[1, 1]), XsdElement(ref='variables', occurs=[1, 1]), XsdElement(ref='keypad', occurs=[1, 1]), XsdElement(ref='macros', occurs=[1, 1]), XsdElement(ref='world', occurs=[1, 1])]
>>> 

Anyway for letting this usable for all the models the ModelVisitor class needs an enhancement (e.g. look-ahead method that verify if a model can be stopped at a certain point).

ethindp commented 1 month ago

@brunato Ah, okay. Is there a way I could use this from within my Ginja2 template, or would I need to write a custom filter? To be clear: I'm not talking about validation. I'm talking about code generation via xmlschema.extras.codegen. That's where I'm trying to figure out if an element is optional or required, because I need to know when to complain about an element not being present from within the template and when to simply move on if the element is unavailable. I'm sorry if I confused you.

brunato commented 1 month ago

You have to write a custom filter. The current version of ModelVisitor is not fully suitable for this role (it was build for validation of an existing content, not for generating content). I will try to extend it's applicability, adding other methods and a wrapper, but I don't know yet if that will be included in a future release.

If you want to generate a valid content (a sequence of child elements) there is no way out of using a model validator that keeps track of the occurrences of the particles (elements, wildcards and groups) and their status (groups stack). Otherwise the generated content could be invalid.

ethindp commented 1 month ago

@brunato Yeah, it might be worth either extending the library with this functionality in the model visitor or adding this filter. I'm uncertain how to do this other than in the generator __init__ itself, where it builds a list of optional and required elements and stuff and then the filter operates based on that. It's doable but... A bit nasty, I'd say.

ethindp commented 1 month ago

Also, it looks as though the model visitor still doesn't really detect optional elements. It sees the include element as being optional, but not any of the others, according to both the output you wrote and if I manually enumerate the model. The world one is required, which I need to affirm in the RelaxNG document, but the rest are optional and should, in theory, have occurrences of (0, 1) (0 to 1 occurrence). I definitely can't tell if this is just a limitation of the library or something else.

brunato commented 4 weeks ago

For checking if a particle of a model group is optional I will add a new method to XsdGroup objects:

    def is_optional(self, particle: ModelParticleType) -> bool:
        """
        Returns `True` if a particle can be optional in the model. This a raw check,
        because the optionality can depend on the presence and the position of other
        elements, that can be checked only using a `ModelVisitor` instance. Raises an
        `XMLSchemaValueError` if the particle is not part of the model group.
        """
        if self.max_occurs == 0:
            raise XMLSchemaValueError("the model group is empty")

        groups = [self]
        iterators: List[Iterator[ModelParticleType]] = []
        particles = iter(self)

        while True:
            for item in particles:
                if item is particle:
                    if item.min_occurs == 0:
                        return True

                    for group in reversed(groups):
                        if group.min_occurs == 0:
                            return True
                        elif group.model == 'choice' and len(group) > 1:
                            return True
                    else:
                        return False

                if isinstance(item, XsdGroup):
                    if item.max_occurs == 0:
                        continue

                    groups.append(item)
                    iterators.append(particles)
                    particles = iter(item.content)
                    if len(iterators) > limits.MAX_MODEL_DEPTH:
                        raise XMLSchemaModelDepthError(self)
                    break
            else:
                try:
                    groups.pop()
                    particles = iterators.pop()
                except IndexError:
                    msg = "The provided particle is not part of the model group"
                    raise XMLSchemaValueError(msg)