Open hoehrmann opened 7 months ago
Hi @hoehrmann, the need is not clear:
references
URL parameter) together with the dataflow artefact (and potentially other very relevant artefacts such as allowed content constraint, codelists, concept schemes, ...) in one single request. Additional requests should not be necessary. One call would always be necessary.Summary:
The intent of the current requirement seems to be to ensure the DSD can be identified from data messages. The current requirements are insufficient to ensure this goal. The idiomatic way would be adding the agencyID
and version
properties, or add a mandatory new property like datastructure
that contains the URN. It would also be possible to add a requirement that the dataStructure
link has a urn
property, but for implementations that read data messages going through all links, and going through all rel values, and checking the urn
field for each is more complicated than direct properties, and adding a JSON Schema validation rule that checks that the DSD URN is present or can be computed is also more complicated.
Details:
Right now the requirement in the specification is satisfied by:
data:
structures:
- links:
- rel: dataStructure
href: https://example.org/X347.xml
There is no way to infer the URN of the DSD. This is valid because only any link to the DSD is required, specifying its URN is not required, and it does not have to be hosted as a SDMX-REST web service (otherwise you could guess the needed URN parts from the URL). This could be addressed by adding that for the dataStructure link the urn
property must be specified.
As for validation, take this example https://sdmx.oecd.org/public/rest/v2/data/dataflow/OECD.SDD.NAD.SEEA/DSD_NAT_RES@DF_NAT_RES/1.0/CAN.A.T.LEAD.*.A?. That claims to be a SDMX-JSON 2.0.0 data message. Ignoring the error in the contentLanguages
property, the incorrect use of ~
in dimension index lists, and the wrong time format for TIME_PERIOD, the message is valid according to the JSON Schema for SDMX-JSON data messages, even though it does not have the required link (it puts the link on the dataSet
instead of the Structure).
It is probably possible to amend the schema to require that in this specific case there must be one link with rel
containing dataStructure
, but it would increase the complexity of the schema.
As for lookups, the current requirement is satisfied by referencing a provisioning agreement. Even if you are lucky and it references a SDMX-REST end point, you can probably only get the dataflow with a single request. In theory you could use references=descendants
or all
but those likely return an unreasonable amount of data and/or might be disabled or throttled on public endpoints as denial of service protection.
As for persistence, if I knew the URN I could try to look it up elsewhere (e.g., I may have old data and the web server just changed its address) but without it I would have to guess the URN (based on the IDs of the fields).
Thanks for the clarifications. The intent of the current link requirement is to ensure that the artefact (either DF, DSD or ProvisionAgreement) for which data have been requested can be fully identified from the data message. The choice of the artefact type is not arbitrary but must correspond to the artefact used in the original data request. This was meant with the wording "At least the link to the Data Structure Definition, Dataflow or Data Provision Agreement to which the data relates is required.", but the wording in the field guide can be improved. E.g., if data was requested for a dataflow, then the dataflow identification is required. If data was requested for a dsd, then the dsd identification is required. Also, in order for the full artefact identification to be available immediately, that link requires the usage of 'self' for the relationship and indeed the URN of the artefact, e.g.,
"href": "https://registry.sdmx.org/ws/rest/dataflow/ECB.DISS/BSI_PUB/1.0",
"rel": "self",
"urn": "urn:sdmx:org.sdmx.infomodel.datastructure.dataflow=ECB.DISS:BSI_PUB(1.0)"
This information is sufficient to retrieve all required structure artefacts in one single request. This can be further clarified in the field guide.
If a client requires the structure information at a later time than the client is free to extract and store the structure information at the same time than the data. If you need the get just the DSD for a DF, then you could use the references=datastructure parameter. Disabling structure retrieval through references as a 'denial of service' protection seems to me an unreasonable approach. Compared to data extractions, structure messages are usually much smaller.
I would conclude, that this ticket specifically requests that the URN of the underlying artefact can be found in a more straightforward way (without looping through the links array), by taking it out of the links array and adding it as a separate structure property (similar to the SDMX-ML data messages) or, e.g., by requiring to position that link as the first value of the links array.
For issues you find in the practical SDMX implementation https://sdmx.oecd.org/public/rest/, could you please open tickets in this separate code repository https://gitlab.com/sis-cc/.stat-suite/dotstatsuite-core-sdmxri-nsi-ws/-/issues/ ?
I would like to add the following point: in a structure message external structures are referenced like this
data:
dataStructures:
- id: EXAMPLE
agencyID: EXAMPLE
version: "1.0.0"
name: Example
isExternalReference: true
Using links
to reference a DSD in data messages is inconsistent with this pattern.
Introduction:
The current SDMX-JSON data message specification requires that the
data.structure
object reference the Data Structure Definition (DSD) through a link to the Data Provision Agreement (DPA) or Dataflow (DF). This approach necessitates multiple lookups and parsing URN references to identify the relevant DSD. It also presents challenges when dealing with non-URN references or situations where the referenced web version becomes unavailable.Proposed Improvement:
To enhance clarity, convenience, and validation capabilities, this proposal recommends including the
agencyID
andversion
of the referenced DSD directly within thedata.structure
object of SDMX-JSON messages. This eliminates the need for intermediary references through DPA or DF and streamlines the process of identifying the relevant DSD.Benefits:
agencyID
andversion
is very simple.Implementation:
Modify the
data.structure
object schema to include two additional properties:agencyID
: reflects DSD agencyIDversion
: reflects DSD versionRemove requirement to have a links property.
Conclusion:
This proposal offers a more efficient and reliable approach to referencing DSDs within SDMX-JSON messages. Direct inclusion of
agencyID
andversion
within thedata.structure
object simplifies data access, enhances validation, and ensures persistent DSD references, fostering a more streamlined and robust data exchange experience.Alternative:
Having a
datastructure
URN reference would also be okay, but thenid
becomes redundant.(In doubt, please handle this as a public review comment on SDMX 3.1 once the comment period begins.)