sdmx-twg / sdmx-ml

This repository is used for maintaining the SDMX-ML format specification
10 stars 3 forks source link

SDMX 3.0: implement "feature 002 Support reference metadata in the Restful APIs" for SDMX-ML messages #17

Closed dosse closed 2 years ago

dosse commented 3 years ago

https://metadatatechnology.com/sdmx3/designs/002/draft/002%20Reference%20Metadata%20API%20V2.1.0%20no%20markup.docx

dosse commented 3 years ago

In order to support the new requirement, it is proposed to:

structure message changes:

--> Implemented in structure schemas by J

data message changes:

Note:

jgager commented 3 years ago

@dosse @stratosn I need some clarifications on this, particularly as it relates to #6 .

  1. The latest version of the Information Model I have seen (Draft 0_7) shows MetadataSet inheriting from MaintainableArtefact (as suggested in the comment on this issue). With this inheritance, a MetadataSet can now have a URN and/or agency, id, version. But I am not clear how the value of agency reconciles with with the data provider reference (either explicit or through a metadata provision agreement) that was added as part of #6. Can you please clarify the intention here? I assume what we ultimately want is for the metadata set to identifiable by a provider, id, and version.
  2. For the new linking mechanism, I understand the intention to be that you can attach reference metadata (or any other documents) to various artifacts. I do not see this in the lasted Information Model, so I assume it just hasn't been incorporated yet. But I assume that this would include the ability to add links to data sets and components of data sets as well. Is that assumption correct? And if it is, does it make more sense to add this to the AnnotatableArtefact model as opposed to Identifiable, as nothing in DataSet inherits from Identifiable?
  3. Regarding removal of the Structure element from the header, I assume this is because in #6 the information that would be contained there was moved directly to the MetadataSet. Is that correct? I will point out that the purpose of having this information in the header was that it allowed a processor to retrieve any structural information it needed to process the body (e.g. DataSet, MetadataSet) before it reached that element (at which point it may be too late). It also allowed the reference to be reused, but that is not overly important. Since there is no structure specific metadata message anymore, one would have the information it needs to retrieve the MetadataStructureDefinition before the reported attributes are reached. But it does now deviate from the DataSet, which I don't like. I would also add that in the scenario where a stub is returned, there is no way to specify the structure/flow/provision agreement any given stub conforms to. My recommendation would be to keep the current mechanism.
agent96 commented 3 years ago

Hi J, I can probably help explain some of these.

  1. All Reference Metadata which relates to data (observations, series, partial keys) are all modelled in the DSD and transmitted like a dataset, like a Group Attribute bits can be submitted as and when the information is available, and is conceptually merged into the dataset. All Reference Metadata against structures uses the MSD but the Metadata Report is now treated more like a maintainable structure as it has ownership (MetadaData Provider) Identity, and Version, and other things that come with a maintainable like startPeriod/endPeriod. The only thing is, it is not an agencyId that owns the report, it is a providerId - so this is where it does not quite fit the inheritance, but generally it can be thought of as maintainable if the definition of maintainable was loosened slightly to say 'owned by an organisation'.

  2. The link from the metadata report is only to one or more identifiable structures, there is no provision to link to anything data related as data related reference metadata is now all transmitted via the Metadata Attribute in the dataset (now modelled in the DSD). The purpose for linking to more then 1 identifiable is to reuse metadata text if it is related to more then 1 structure, for example the same report linked to multiple dataflows.

  3. The link from the report to the provision is now in the report, link a Dataflow would link to a DSD. The model for reference metadata is based on the structure model where links are embedded not abstracted to the header. This deviates from dataset, but the design was to treat all data related metadata like data (and transport it using a dataset) with structure related metadata like structures.

jgager commented 3 years ago

Thanks for the reply Matt.

So putting 1 and 3 together, what I am understanding is that the MetadataSet is now only used to report metadata against structure artifacts, and therefore should be treated more like structure metadata as opposed to data. Is that correct?

If that is correct, maybe it does make sense to keep the inheritance of Maintainable from MetadataSet in place, and simply document that the agency is the metadata provider in this case. And for the reference to the provision agreement/flow/structure in the MetadataSet, we remove the MetadataProvider element (since it is already stated as the agency ):

   <xs:choice>
      <xs:element name="MetadataProvisionAgreement" type="common:MetadataProvisionAgreementReferenceType"/>
      <xs:sequence>
         <xs:element name="MetadataProvider" type="common:DataProviderReferenceType"/>
         <xs:choice>
            <xs:element name="Metadataflow" type="common:MetadataflowReferenceType"/>
            <xs:element name="MetadataStructure" type="common:MetadataStructureReferenceType"/>
         </xs:choice>
      </xs:sequence>
   </xs:choice>

Does that sound correct to you?

For number 2, I think I understand. Reference metadata is no longer attached to data. Instead it is defined in the DSD as attributes with the assignment status of "Metadata". And the idea of the new Link on Identifiable structure artifacts is that it allows the artifacts to link to associated reference metadata directly. So in a DSD I might have something like this on a dimension:

<Dimension id="INDICATOR">
  <Link urn="ref_metadata_set"/>
</Dimension>

Is this correct?

If it is correct, what is the URN of a MetadataSet (in particular the package)? I do not see that addressed anywhere, but it would seem that the Link mechanism is looking for a urn, correct?

agent96 commented 3 years ago

Hi J-

So putting 1 and 3 together, what I am understanding is that the MetadataSet is now only used to report metadata against structure artifacts, and therefore should be treated more like structure metadata as opposed to data. Is that correct?

Yes

If that is correct, maybe it does make sense to keep the inheritance of Maintainable from MetadataSet in place, and simply document that the agency is the metadata provider in this case. And for the reference to the provision agreement/flow/structure in the MetadataSet, we remove the MetadataProvider element (since it is already stated as the agency ):

Yes that sounds correct

the idea of the new Link on Identifiable structure artifacts is that it allows the artifacts to link to associated reference metadata directly. So in a DSD I might have something like this on a dimension:

The idea is the metadata set references the identifiable, or more then one if the report is relevant to multiple identifiables (not in combination but individually, i.e. this report is relevent to 3 dataflows). The fact that someone linked a metadata set to an identifiable means when the user queries for the identifiable, the link back to the reference metadata is added (dynamically). So your XML is valid, but the link is only there because a metadata set linked to it, it was not added to the identifaible directly (it is derived at runtime).

To answer the question, what is the URN. I'm not sure this was discussed, it needs a package and class. So I guess somehting like metadata and MetadataSet?

stratosn commented 3 years ago

Dear @jgager and @agent96, As regards the last point, i.e., the link to an identifiable, although that (dynamic) link seems useful, it might be a bit odd, compared to the how referencing works for any other Artefact. I mean that, for any Artefact, querying for the latter's parents would simply return all those Artefacts that refer to it. Similarly, for the example above, querying for the DSD with its parents, should return (apart from any other Artefact, like Dataflow) any Metadataset that refers to that DSD, or any of its identifiable content.

agent96 commented 3 years ago

@stratosn The link is from the identifiable to a refererence metadata report. My undersntadning is these are two different message types, so I don't think querying for a Dataflow (for example) with parents will bring back reference metadata as well, only the structures.

stratosn commented 3 years ago

@agent96 just to confirm my understanding, you are proposing to have a "link" property added to any identifiable, in order to be able to include any links to related Metadatasets. This, though, would not be part of the model (the relation is already in the IM from the Metadataset point of view), only of the implementation to provide convenience of discovering reference metadata when working with identifiables. Can you confirm?

dosse commented 3 years ago

@stratosn You asked @agent96 for confirmation. I believe that the answer is yes. The linking mechanism would work in sdmx-ml in a similar way as the current link object in sdmx-json structure and data messages. It's a list that allows adding any link, including those provided by users and those auto-generated by the system depending on the current db content. These links behave like content and do not require a change of the artefact version.

agent96 commented 3 years ago

@stratosn yes sorry, I did not see the question the response from @dosse is exactly how it was supposed to work

dosse commented 3 years ago

Hello @agent96, Concerning:

If that is correct, maybe it does make sense to keep the inheritance of Maintainable from MetadataSet in place, and simply document that the agency is the metadata provider in this case. And for the reference to the provision agreement/flow/structure in the MetadataSet, we remove the MetadataProvider element (since it is already stated as the agency):

Could you please clarify the linkages for the MetadataSet (as I understood the feature document):

Thanks for the confirmation.

dosse commented 2 years ago

Released