tefra / xsdata

Naive XML & JSON Bindings for python
https://xsdata.readthedocs.io
MIT License
336 stars 61 forks source link

Use case ISO19155 #956

Closed fleimgruber closed 9 months ago

fleimgruber commented 9 months ago

For my use case and maybe valuable as a test case for xsdata I tried code generation of bindings to https://github.com/ISO-TC211/XML. You can find a status of that and tests in my fork here: https://github.com/fleimgruber/xsdata/commit/1d6fc4dd1542ba75c23f8628629fd5d46ad7fa28 where I included two tests (and the generated bindings) that should illustrate the problems.

Specifically, this was the command I used to generate the bindings:

xsdata generate ..\XML\schemas.isotc211.org\19115\-3\mds\2.0\mds.xsd --package xsdata.iso.models -ss single-package

and the errors that I get when running test_metadata1 and test_metadata2 (see commit referenced above) are:

  1. test_metadata1: xsdata.exceptions.SerializerError: CiResponsibleParty is not derived from CiResponsiblePartyPropertyType
  2. test_metadata2: xsdata.exceptions.SerializerError: CiResponsibility2 is not derived from AbstractResponsibilityPropertyType

So the question I have is whether the problem lies in

if we assume the upstream .xsd to be correct, then it is down to code generation or bindings usage. So kind request for review as we might catch a bug in code generation. I will also try with pydantic (v1) code generation, maybe the issue becomes more obvious that way.

tefra commented 9 months ago

From a quick glance you are trying to assign the wrong type to the contact field,

xsdata download https://raw.githubusercontent.com/ISO-TC211/XML/master/schemas.isotc211.org/19115/-3/mds/2.0/mds.xsd --output issues/schemas/

xsdata generate issues/schemas/schemas.isotc211.org/19115/-3/mds/2.0/mds.xsd --package issues.models -ss single-package

It helps if you try to parse an existing sample.

import urllib.request

from xsdata.formats.dataclass.context import XmlContext
from xsdata.formats.dataclass.parsers import XmlParser
from xsdata.formats.dataclass.serializers import PycodeSerializer

from issues.models import *

context = XmlContext()

url = "https://raw.githubusercontent.com/ISO-TC211/XML/master/schemas.isotc211.org/19115/-3/mds/2.0/mds.xml"
with urllib.request.urlopen(url) as fp:
    obj = XmlParser(context=context).parse(fp)
    print(PycodeSerializer(context=context).render(obj))
# a lot of imports here

obj = MdMetadata2(
    metadata_identifier=MdIdentifierPropertyType1(
        md_identifier=MdIdentifier1(
            code=CharacterStringPropertyType1(

            )
        )
    ),
    default_locale=PtLocalePropertyType1(
        pt_locale=PtLocale1(
            language=LanguageCodePropertyType1(

            ),
            character_encoding=MdCharacterSetCodePropertyType1(

            )
        )
    ),
    parent_metadata=AbstractCitationPropertyType(
        standards_iso_org_iso_19115_3_cit_2_0_ci_citation=CiCitation1(
            title=CharacterStringPropertyType1(

            )
        )
    ),
    metadata_scope=[
        MdMetadataScopePropertyType(
            md_metadata_scope=MdMetadataScope(
                resource_scope=MdScopeCodePropertyType1(

                )
            )
        ),
    ],
    contact=[
        AbstractResponsibilityPropertyType(
            standards_iso_org_iso_19115_3_cit_2_0_ci_responsibility=CiResponsibility1(
                role=CiRoleCodePropertyType1(
                    ci_role_code=CiRoleCode1(
                        value='pointOfContact',
                        code_list='codeListLocation#CI_RoleCode',
                        code_list_value='pointOfContact'
                    )
                ),
                party=[
                    AbstractCiPartyPropertyType1(
                        ci_organisation=CiOrganisation1(
                            name=CharacterStringPropertyType1(
                                character_string='Organisation Name'
                            )
                        )
                    ),
                ]
            )
        ),
    ],
    date_info=[
        AbstractTypedDatePropertyType(
            standards_iso_org_iso_19115_3_cit_2_0_ci_date=CiDate1(
                date=DatePropertyType1(
                    date_time=XmlDateTime(2014, 4, 3, 16, 0, 0)
                ),
                date_type=CiDateTypeCodePropertyType1(
                    ci_date_type_code=CiDateTypeCode1(
                        value='creation',
                        code_list='codeListLocation#CI_DateTypeCode',
                        code_list_value='creation'
                    )
                )
            )
        ),
    ],
    metadata_standard=[
        AbstractCitationPropertyType(
            standards_iso_org_iso_19115_3_cit_2_0_ci_citation=CiCitation1(
                title=CharacterStringPropertyType1(

                )
            )
        ),
    ],
    metadata_profile=[
        AbstractCitationPropertyType(
            standards_iso_org_iso_19115_3_cit_2_0_ci_citation=CiCitation1(
                title=CharacterStringPropertyType1(

                )
            )
        ),
    ],
    alternative_metadata_reference=[
        AbstractCitationPropertyType(
            standards_iso_org_iso_19115_3_cit_2_0_ci_citation=CiCitation1(
                title=CharacterStringPropertyType1(

                )
            )
        ),
    ],
    other_locale=[
        PtLocalePropertyType1(
            pt_locale=PtLocale1(
                language=LanguageCodePropertyType1(

                ),
                character_encoding=MdCharacterSetCodePropertyType1(

                )
            )
        ),
    ],
    metadata_linkage=[
        AbstractOnlineResourcePropertyType(
            standards_iso_org_iso_19115_3_cit_2_0_ci_online_resource=CiOnlineResource1(
                linkage=CharacterStringPropertyType1(

                )
            )
        ),
    ],
    spatial_representation_info=[
        AbstractSpatialRepresentationPropertyType(
            md_grid_spatial_representation=MdGridSpatialRepresentation2(
                number_of_dimensions=IntegerPropertyType1(

                ),
                cell_geometry=MdCellGeometryCodePropertyType2(

                ),
                transformation_parameter_availability=BooleanPropertyType2(

                )
            )
        ),
    ],
    reference_system_info=[
        AbstractReferenceSystemPropertyType(
            md_reference_system=MdReferenceSystem2(

            )
        ),
    ],
    identification_info=[
        AbstractResourceDescriptionPropertyType(
            md_data_identification=MdDataIdentification2(
                citation=AbstractCitationPropertyType(
                    standards_iso_org_iso_19115_3_cit_2_0_ci_citation=CiCitation1(
                        title=CharacterStringPropertyType1(
                            character_string='Sample Metadata for Minimal Conformance Class'
                        )
                    )
                ),
                abstract=CharacterStringPropertyType1(
                    character_string='This sample record has all required elements for minimal ISO 19115:2013 metadata'
                )
            )
        ),
    ],
    content_info=[
        AbstractContentInformationPropertyType(
            md_coverage_description=MdCoverageDescription2(
                attribute_description=RecordTypePropertyType1(
                    record_type=RecordType1(

                    )
                )
            )
        ),
    ],
    distribution_info=[
        AbstractDistributionPropertyType(
            md_distribution=MdDistribution2(

            )
        ),
    ],
    data_quality_info=[
        AbstractDataQualityPropertyType(
            dq_data_quality=DqDataQuality2(
                scope=MdScopePropertyType(

                ),
                report=[
                    AbstractDqElementPropertyType(

                    ),
                ]
            )
        ),
    ],
    resource_lineage=[
        AbstractLineageInformationPropertyType(
            li_lineage=LiLineage2(

            )
        ),
    ],
    portrayal_catalogue_info=[
        AbstractPortrayalCatalogueInformationPropertyType(
            md_portrayal_catalogue_reference=MdPortrayalCatalogueReference2(
                portrayal_catalogue_citation=[
                    AbstractCitationPropertyType(

                    ),
                ]
            )
        ),
    ],
    metadata_constraints=[
        AbstractConstraintsPropertyType(
            md_constraints=MdConstraints2(

            )
        ),
    ],
    application_schema_info=[
        AbstractApplicationSchemaInformationPropertyType(
            md_application_schema_information=MdApplicationSchemaInformation2(
                name=AbstractCitationPropertyType(

                ),
                schema_language=CharacterStringPropertyType1(

                ),
                constraint_language=CharacterStringPropertyType1(

                )
            )
        ),
    ],
    metadata_maintenance=AbstractMaintenanceInformationPropertyType(
        md_maintenance_information=MdMaintenanceInformation2(

        )
    ),
    standards_iso_org_iso_19115_3_mdb_2_0_acquisition_information=[
        AbstractAcquisitionInformationPropertyType(
            mi_acquisition_information=MiAcquisitionInformation(
                scope=MdScopePropertyType(
                    md_scope=MdScope(
                        level=MdScopeCodePropertyType1(
                            md_scope_code=MdScopeCode1(
                                value='mcc:MD_ScopeCode',
                                code_list='https://schemas.isotc211.org/19115/resources/Codelist/cat/codelists.xml#',
                                code_list_value='mcc:MD_ScopeCode'
                            )
                        ),
                        extent=[
                            AbstractExtentPropertyType(
                                ex_extent=ExExtent2(
                                    description=CharacterStringPropertyType1(
                                        character_string='gex:description/gco:CharacterString'
                                    )
                                )
                            ),
                        ],
                        level_description=[
                            MdScopeDescriptionPropertyType1(
                                md_scope_description=MdScopeDescription1(
                                    other=CharacterStringPropertyType1(
                                        character_string='mcc:other/gco:CharacterString'
                                    )
                                )
                            ),
                        ]
                    )
                )
            )
        ),
    ]
)
tefra commented 9 months ago

Let me know if you something else that could point to a bug in the generator

fleimgruber commented 9 months ago

Thanks for the quick and comprehensive answer, that "xml -> XmlParser -> PycodeSerializer" approach is very neat and handy to have. Nice job in general on the Parser <> Serializer APIs!

fleimgruber commented 9 months ago

Thanks again for your help, I can confirm it works using the "cheat sheet" from PycodeSerializer.

fleimgruber commented 9 months ago

Let me know if you something else that could point to a bug in the generator

@tefra Reopening as I might have found a bug. Please see https://github.com/fleimgruber/xsdata/commit/3ba2e9e922317e0dfed85e0beb3c91e60b89a1eb and in particular the failing test_metadata_date_info(). For schema validation I added the dependency xmlschema. Validation fails with this error:

E           xmlschema.validators.exceptions.XMLSchemaChildrenValidationError: failed validating <Element '{http://standards.iso.org/iso/19115/-3/cit/2.0}date' at 0x0000017003B8E9D0> with XsdGroup(model='choice', occurs=[0, 1]):
E           
E           Reason: Unexpected child with tag 'gco:DateTime' at position 2.
E           
E           Schema component:
E           
E             <xs:complexType xmlns:xs="http://www.w3.org/2001/XMLSchema" name="Date_PropertyType">
E                 <xs:choice minOccurs="0">
E                     <xs:element ref="gco:Date" />
E                     <xs:element ref="gco:DateTime" />
E                 </xs:choice>
E                 <xs:attribute ref="gco:nilReason" />
E             </xs:complexType>
E           
E           Instance type: <class 'xml.etree.ElementTree.Element'>
E           
E           Instance:
E           
E             <ns0:date xmlns:gco="http://standards.iso.org/iso/19115/-3/gco/1.0" xmlns:ns0="http://standards.iso.org/iso/19115/-3/cit/2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
E               <gco:Date xsi:nil="true" />
E               <gco:DateTime>2014-04-03T16:00:00</gco:DateTime>
E             </ns0:date>
E           
E           Path: /ns0:MD_Metadata/ns0:dateInfo/ns2:CI_Date/ns2:date

AFAICS, this is because xmlschema allows gco:Date XOR gco:DateTime as the schema defines it as a <xs:choice ...>, but the XmlSerializer includes both gco:Date and gco:DateTime. Please let me know if the implementation (Metadata2) is wrong, I oriented it around the PycodeSerializer results of the example mds.xml.

tefra commented 9 months ago

Take a look at the compound-fields configuration

This option groups together repeatable choices

  <xsd:element name="root">
    <xsd:complexType>
      <xsd:choice maxOccurs="unbounded">
        <xsd:element ref="alpha" />
        <xsd:element ref="bravo" />
      </xsd:choice>
    </xsd:complexType>
  </xsd:element>
@dataclass
class Root:
    class Meta:
        name = "root"

    alpha_or_bravo: List[Union[Alpha, Bravo]] = field(
        default_factory=list,
        metadata={
            "type": "Elements",
            "choices": (
                {
                    "name": "alpha",
                    "type": Alpha,
                },
                {
                    "name": "bravo",
                    "type": Bravo,
                },
            ),
        },
    )
fleimgruber commented 9 months ago

@tefra spot on, thanks again!