khaeru / sdmx

SDMX information model and client in Python
https://sdmx1.readthedocs.io
Apache License 2.0
23 stars 17 forks source link

Information loss during round trip of structure message #149

Closed goatsweater closed 6 months ago

goatsweater commented 7 months ago

I've noted a few dropped attributes when calling to_xml() on a structure message. Specifically:

This was found through manual inspect of a diff after round tripping a structure message file (sdmx.read_sdmx() -> sdmx.to_xml(obj)).

Code to reproduce:

from pathlib import Path
import sdmx

infile = Path("sdmx_struct_sample.xml")
outfile = Path("sdmx_struct_out.xml")

msg = sdmx.read_sdmx(infile)
with outfile.open("w") as out:
    print(sdmx.to_xml(msg, pretty_print=True, xml_declaration=True, encoding="utf-8").decode(), file=out)

Diff the input vs the output. Attempting to read the output back in and run compare() crashes on me (XMLSyntaxError). I believe it is due to my use of a second language in the sample, but I haven't dug into that.

Input file:

<?xml version="1.0" encoding="utf-8"?>
<message:Structure xmlns:message="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/message" xmlns:structure="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/structure" xmlns:common="http://www.sdmx.org/resources/sdmxml/schemas/v2_1/common">
  <message:Header>
    <message:ID>IDREF1</message:ID>
    <message:Test>false</message:Test>
    <message:Prepared>2023-10-31T19:31:13.4708597+00:00</message:Prepared>
    <message:Sender id="Unknown" />
    <message:Receiver id="Unknown" />
  </message:Header>
  <message:Structures>
    <structure:Dataflows>
      <structure:Dataflow id="DF_SAMPLE_CURRENT_REV" agencyID="AGENCYID" version="2.0" isFinal="true">
        <common:Annotations>
          <common:Annotation>
            <common:AnnotationTitle>TIME PERIOD</common:AnnotationTitle>
            <common:AnnotationType>LAYOUT_COLUMN</common:AnnotationType>
          </common:Annotation>
          <common:Annotation>
            <common:AnnotationTitle>PRODUCT</common:AnnotationTitle>
            <common:AnnotationType>LAYOUT_ROW</common:AnnotationType>
          </common:Annotation>
          <common:Annotation>
            <common:AnnotationTitle>WHERE_TO=0</common:AnnotationTitle>
            <common:AnnotationType>DEFAULT</common:AnnotationType>
          </common:Annotation>
        </common:Annotations>
        <common:Name xml:lang="en">Sample (Valid from 2023-05-01 to 2023-09-01)</common:Name>
        <common:Name xml:lang="fr">Échantillon (Valide du 2023-05-01 au 2023-09-01)</common:Name>
        <common:Description xml:lang="en">A long description</common:Description>
        <common:Description xml:lang="fr">Une longue description</common:Description>
        <structure:Structure>
          <Ref id="DSD_SAMPLE_CURRENT_REV" version="2.0" agencyID="AGENCYID" package="datastructure" class="DataStructure" />
        </structure:Structure>
      </structure:Dataflow>
    </structure:Dataflows>
    <structure:DataStructures>
      <structure:DataStructure id="DSD_SAMPLE_CURRENT_REV" agencyID="AGENCYID" version="2.0" isFinal="true">
        <common:Name xml:lang="en">AGENCYID SAMPLE</common:Name>
        <common:Name xml:lang="fr">AGENCYID Échantillon</common:Name>
        <common:Description xml:lang="en">DSD for sample.</common:Description>
        <structure:DataStructureComponents>
          <structure:DimensionList id="DimensionDescriptor">
            <structure:Dimension id="WHAT" position="1">
              <structure:ConceptIdentity>
                <Ref id="WHAT" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
              <structure:LocalRepresentation>
                <structure:Enumeration>
                  <Ref id="CL_COM22" version="1.0" agencyID="AGENCYID" package="codelist" class="Codelist" />
                </structure:Enumeration>
              </structure:LocalRepresentation>
            </structure:Dimension>
            <structure:Dimension id="WHERE_TO" position="2">
              <structure:ConceptIdentity>
                <Ref id="WHERE_TO" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
              <structure:LocalRepresentation>
                <structure:Enumeration>
                  <Ref id="CL_GEO11_SUBSET" version="1.0" agencyID="AGENCYID" package="codelist" class="Codelist" />
                </structure:Enumeration>
              </structure:LocalRepresentation>
            </structure:Dimension>
            <structure:Dimension id="AUX1" position="3">
              <structure:ConceptIdentity>
                <Ref id="AUX1" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
              <structure:LocalRepresentation>
                <structure:Enumeration>
                  <Ref id="CL_MA09" version="1.0" agencyID="AGENCYID" package="codelist" class="Codelist" />
                </structure:Enumeration>
              </structure:LocalRepresentation>
            </structure:Dimension>
            <structure:Dimension id="AUX2" position="4">
              <structure:ConceptIdentity>
                <Ref id="AUX2" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
              <structure:LocalRepresentation>
                <structure:Enumeration>
                  <Ref id="CL_TR10_SUBSET" version="1.0" agencyID="AGENCYID" package="codelist" class="Codelist" />
                </structure:Enumeration>
              </structure:LocalRepresentation>
            </structure:Dimension>
            <structure:Dimension id="UNIT_MEASURE" position="5">
              <structure:ConceptIdentity>
                <Ref id="UNIT_MEASURE" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
              <structure:LocalRepresentation>
                <structure:Enumeration>
                  <Ref id="CL_UNIT_MEASURE" version="1.0" agencyID="AGENCYID" package="codelist" class="Codelist" />
                </structure:Enumeration>
              </structure:LocalRepresentation>
            </structure:Dimension>
            <structure:TimeDimension id="TIME_PERIOD" position="6">
              <structure:ConceptIdentity>
                <Ref id="TIME_PERIOD" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
              <structure:LocalRepresentation>
                <structure:TextFormat textType="ObservationalTimePeriod" />
              </structure:LocalRepresentation>
            </structure:TimeDimension>
            <structure:Dimension id="FREQ" position="7">
              <structure:ConceptIdentity>
                <Ref id="FREQ" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
              <structure:LocalRepresentation>
                <structure:Enumeration>
                  <Ref id="CL_FREQ" version="2.1" agencyID="SDMX" package="codelist" class="Codelist" />
                </structure:Enumeration>
              </structure:LocalRepresentation>
            </structure:Dimension>
          </structure:DimensionList>
          <structure:AttributeList id="AttributeDescriptor">
            <structure:Attribute id="OBS_STATUS" assignmentStatus="Conditional">
              <structure:ConceptIdentity>
                <Ref id="OBS_STATUS" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
              <structure:LocalRepresentation>
                <structure:Enumeration>
                  <Ref id="CL_OBS_STATUS" version="2.0" agencyID="STC" package="codelist" class="Codelist" />
                </structure:Enumeration>
              </structure:LocalRepresentation>
              <structure:AttributeRelationship>
                <structure:PrimaryMeasure>
                  <Ref id="OBS_VALUE" />
                </structure:PrimaryMeasure>
              </structure:AttributeRelationship>
            </structure:Attribute>
            <structure:Attribute id="BASE_PER" assignmentStatus="Mandatory">
              <structure:ConceptIdentity>
                <Ref id="BASE_PER" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
              <structure:AttributeRelationship>
                <structure:None />
              </structure:AttributeRelationship>
            </structure:Attribute>
            <structure:Attribute id="CALC_METHOD" assignmentStatus="Conditional">
              <structure:ConceptIdentity>
                <Ref id="CALC_METHOD" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
              <structure:AttributeRelationship>
                <structure:PrimaryMeasure>
                  <Ref id="OBS_VALUE" />
                </structure:PrimaryMeasure>
              </structure:AttributeRelationship>
            </structure:Attribute>
            <structure:Attribute id="CURRENT_PROCESS_PER" assignmentStatus="Conditional">
              <structure:ConceptIdentity>
                <Ref id="CURRENT_PROCESS_PER" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
              <structure:AttributeRelationship>
                <structure:PrimaryMeasure>
                  <Ref id="OBS_VALUE" />
                </structure:PrimaryMeasure>
              </structure:AttributeRelationship>
            </structure:Attribute>
            <structure:Attribute id="INDEX_DATE" assignmentStatus="Mandatory">
              <structure:ConceptIdentity>
                <Ref id="INDEX_DATE" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
              <structure:AttributeRelationship>
                <structure:PrimaryMeasure>
                  <Ref id="OBS_VALUE" />
                </structure:PrimaryMeasure>
              </structure:AttributeRelationship>
            </structure:Attribute>
            <structure:Attribute id="RUN_NUM" assignmentStatus="Conditional">
              <structure:ConceptIdentity>
                <Ref id="RUN_NUM" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
              <structure:AttributeRelationship>
                <structure:PrimaryMeasure>
                  <Ref id="OBS_VALUE" />
                </structure:PrimaryMeasure>
              </structure:AttributeRelationship>
            </structure:Attribute>
            <structure:Attribute id="REBASE_FACTOR" assignmentStatus="Conditional">
              <structure:ConceptIdentity>
                <Ref id="REBASE_FACTOR" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
              <structure:AttributeRelationship>
                <structure:PrimaryMeasure>
                  <Ref id="OBS_VALUE" />
                </structure:PrimaryMeasure>
              </structure:AttributeRelationship>
            </structure:Attribute>
            <structure:Attribute id="REVISION_STATUS" assignmentStatus="Mandatory">
              <structure:ConceptIdentity>
                <Ref id="REVISION_STATUS" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
              <structure:LocalRepresentation>
                <structure:Enumeration>
                  <Ref id="CL_REVISION_STATUS" version="1.0" agencyID="AGENCYID" package="codelist" class="Codelist" />
                </structure:Enumeration>
              </structure:LocalRepresentation>
              <structure:AttributeRelationship>
                <structure:PrimaryMeasure>
                  <Ref id="OBS_VALUE" />
                </structure:PrimaryMeasure>
              </structure:AttributeRelationship>
            </structure:Attribute>
          </structure:AttributeList>
          <structure:MeasureList id="MeasureDescriptor">
            <structure:PrimaryMeasure id="OBS_VALUE">
              <structure:ConceptIdentity>
                <Ref id="OBS_VALUE" maintainableParentID="CS_SAMPLE_BETA" maintainableParentVersion="1.0" agencyID="AGENCYID" package="conceptscheme" class="Concept" />
              </structure:ConceptIdentity>
            </structure:PrimaryMeasure>
          </structure:MeasureList>
        </structure:DataStructureComponents>
      </structure:DataStructure>
    </structure:DataStructures>
    <structure:Constraints>
      <structure:ContentConstraint id="CC_DF_SAMPLE_CURRENT_REV" agencyID="AGENCYID" version="2.0" isFinal="true" type="Actual">
        <common:Name xml:lang="en">Content Constraint on Revision Status = Current</common:Name>
        <structure:ConstraintAttachment>
          <structure:Dataflow>
            <Ref id="DF_SAMPLE_CURRENT_REV" version="2.0" agencyID="AGENCYID" package="datastructure" class="Dataflow" />
          </structure:Dataflow>
        </structure:ConstraintAttachment>
        <structure:CubeRegion include="true">
          <common:Attribute id="REVISION_STATUS">
            <common:Value>C</common:Value>
          </common:Attribute>
        </structure:CubeRegion>
      </structure:ContentConstraint>
    </structure:Constraints>
  </message:Structures>
</message:Structure>
khaeru commented 7 months ago

Great—thanks again for clear and explicit descriptions and a MWE. I will open one or more PRs, either next week or the week after.

khaeru commented 6 months ago

Working on #150, I notice another mismatch: the output SDMX-ML has the dimension FREQ in position 6, and TIME_PERIOD in position 7, which is reversed vis-à-vis the input. I'll also fix this.