sdmx-twg / vtl

This repository is used for maintaining the SDMX-VTL specification
9 stars 8 forks source link

SDMX / VTL - Type / Value domain #402

Open NicoLaval opened 5 months ago

NicoLaval commented 5 months ago

Hi all,

I need your advice regarding the SDMX --> VTL

Considering the below SDMX DSD fragment (included in a Component):

<str:LocalRepresentation minOccurs="1" maxOccurs="1">
    <str:Enumeration>urn:sdmx:org.sdmx.infomodel.codelist.Codelist=FR1:CL_TEST(1.0)</str:Enumeration>
</str:LocalRepresentation>

The CL_TEST code list is composed of strings (but this string type is not defined anywhere).

My questions are:

Thanks in advance

vpinna80 commented 5 months ago

Hi Nicolas, Unfortunately the answers to your questions are "it depends".... You can find all the rules for mapping SDMX artifacts to VTL in the SDMX Specifications section 6, chapter 12.

antonio-olleros commented 5 months ago

Hi Nico,

Very interesting (and hot) question... In my view (but it is just a view...) all enumerated types should be strings, even if the codes can be represented as integers. The reason is that is that you will never operate with those codes, and in the VTL world types matter for what operations you want to do with them.

Regarding the value domain, well, it certainly depends... and mainly on the practice of the SDMX modeller (which I find often not to be the best...) It is quite likely that CL_TEST contains a lot of values that are not used in the dataset, and modellers may have chosen to use region constraints to limit those values. If so, I think those constraints would provide the relevant value domain.

And for your third point, if I understand your question correctly, I agree with you, and that was my main point in the meeting in Paris (and in the word document we shared after that regarding the data model). We should define why we have value domains and other objects in VTL (and compatibility with GSIM is not a good reason, for me, because dropping things does not make VTL incompatible, it is already a subset of GSIM... So, we can drop artifacts while keeping compatibility). In my view:

vpinna80 commented 5 months ago

Actually there are quite a lot of integer enumerated sets examples, most of them coming from the statistical domain where you have surveys with numbered responses with the addition of special values (i.e. -99: invalid, -101: no response, and so on).

Creating and manipulating hierarchies is also a use case for codes, for example you can have an automatic hierarchy ruleset import system from the dsd (I used this while working with Pacific Community Hub).