sissaschool / xmlschema

XML Schema validator and data conversion library for Python
MIT License
426 stars 72 forks source link

`import` ignored, when importing second XSD into the same namespace #419

Open jakubklimek opened 2 months ago

jakubklimek commented 2 months ago

I have this document and an XSD (linked from it from schemaLocation).

Now, the XSD imports 2 other XSDs with the same targetNamespace:

  <xs:import namespace="http://dia.gov.cz/ns/spolecne-casti-elektronickych-dokumentu/osoby" schemaLocation="https://ofn.gov.cz/společné-části-elektronických-dokumentů/2024-10-04/schémata/osoby/osoba-ve-vztahu-k-dokumentu.xsd"/>
  <xs:import namespace="http://dia.gov.cz/ns/spolecne-casti-elektronickych-dokumentu/osoby" schemaLocation="https://ofn.gov.cz/společné-části-elektronických-dokumentů/2024-10-04/schémata/osoby/fyzická-osoba-ve-vztahu-k-dokumentu.xsd"/>

The XSDs have both the same targetNamespace and contain no conflicting definitions. Nevertheless, the second import is ignored, resulting in an error: unknown type 'osoby:fyzická_osoba_ve_vztahu_k_dokumentu

because that type is defined in the second imported XSD. When I comment out the first import, the second starts working.

This should be legal in XML Schema, and other validators like Altova work fine with it.

To be fair, this behavior is not explicitly forbidden by the specification, see Note at the end of 4.2.6.2, but there is a warning that applications might miss information, if implemented like this, which is exactly the case here.

Given that the schemaLocation [attribute] is only a hint, it is open to applications to ignore all but the first for a given namespace, regardless of the ·actual value· of schemaLocation, but such a strategy risks missing useful information when new schemaLocations are offered.

brunato commented 1 month ago

Hi,

I don't know how Altova manages multiple imports of the same namespace. The note that you cited include another part before:

Note: The above is carefully worded so that multiple ing of the same schema document will not constitute a violation of clause 2 of Schema Properties Correct (§3.17.6.1), but applications are allowed, indeed encouraged, to avoid ing the same schema document more than once to forestall the necessity of establishing identity component by component. Given that the schemaLocation [attribute] is only a hint, it is open to applications to ignore all but the first for a given namespace, regardless of the ·actual value· of schemaLocation, but such a strategy risks missing useful information when new schemaLocations are offered.

So I've to check if permitting multiple imports of the same namespace will not produce problems on deciding which components consider to build the complete schema in case of name collisions. In these cases for import/redefine/override an error is raised, but for imports I've to check if this can be a problem. Consider that imported schemas are often edited by others.

Anyway the recommendation is clear in the note so allowing multiple imports of the same namespace will be eventually supported by an option and disabled by default.

Thank you

jakubklimek commented 1 month ago

Hi, thanks for the consideration.

I think name collisions with multiple imports should be handled the same as name collision in includes, or, in the same schema. It is up to the schema authors to avoid them and to know what they are doing.

brunato commented 1 month ago

Locations in <xs:import/> statements are only hints that validators could ignore, this is a big difference with inclusions (also redefine and override). So collisions caused by includes are in fact errors. Collisions of imports can be generated by assembly schemas from various namespaces, and sometimes, reloading the same schema can happen.

The enhancement will be included in the next major release. Probably it will be handled with one or more loader classes. The default strategy will remain the current (process only the first import for each namespace), the others will be based on the absolute URL of schemas. I will include also a URL-based loader that reject imports that generate a collisions (producing only a warning).

With a class-based loader the behavior can be changed by a developer/user using a derived class.