sdmx-twg / vtl

This repository is used for maintaining the SDMX-VTL specification
9 stars 8 forks source link

Allow specifying an identifier component in the membership operator #263

Closed capacma closed 11 months ago

capacma commented 7 years ago
reporter issue reference document (UM/RM/EBNF) page line
MC#263 RM 41 1373

Issue Description

In the current version of VTL it is not possible to write a simple check like the following: check ( ds . ref_area = "IT" ) because the membership operator does not allow to specify an identifier component.

Proposed Solution

Allow to specify an identifier component in the membership operator. The returned dataset has the identifier components of the input dataset and a measure whose name is "computed" and whose value is the value of the identifier component. Note: we cannot simply promote the specified identifier component to a measure because this will potentially create duplications that cannot be eliminated (otherwise the information on the data points is lost).

vignola commented 7 years ago

The membership operator, when applied to a measure or attribute component, retrieves a dataset having: the same identifiers of the input dataset and the measure selected by the membership. Under the assumption that the ds1 identifiers are SEX and REF_AREA, the result of:

ds2:=ds1.OBS_VALUE_2015

where

ds1

SEX REF_AREA OBS_VALUE_2014 OBS_VALUE_2015
M IT 146 152
F IT 160 180

is:

ds2

SEX REF_AREA OBS_VALUE_2015
M IT 152
F IT 180

It is not instead, specified what is the resulting dataset when an identifier is selected, for example what is the result of ds1.SEX where ds1 is the dataset used in the example above?

There are two different proposals for it:

retrieves

ds2

SEX REF_AREA STRING_VAR
M IT M
F IT F

retrieves

ds2

SEX REF_AREA CONDITION
M IT true
F IT true

In both cases when the "membership on identifiers" is applied a VTL operator, the operator must have the following characteristics:

Under these conditions the result of:

ds2:=ds1.SEX="M"

will be in both cases

SEX REF_AREA CONDITION
M IT true
F IT false
capacma commented 7 years ago
  1. membership, selecting an attribute: does the attribute become a measure?
  2. membership, selecting a measure: drop the other measures
  3. membership, selecting a measure: drop all attributes (follows from the general rule of operators)
  4. User need: how to express: check ( ds.1sex = "M" )
vignola commented 7 years ago

The first proposal seems to me more useful because in this case we have a new measure to be used in a more clear way (because it contains the values of the identifiers). My only concern is that if the resulting dataset of ds1.SEX is a new dataset having all the identifiers and a measure named CONDITION (or whatever) than if we want to compare if the new measure is equal to "M" we should write:

ds_r:= [ds1.SEX].CONDITION="M"

or we should say that ds1.SEX is equivalent (only for identifiers) to [ds1.SEX].CONDITION

About the other issues written by Maurizio I can try to answer in the following way:

  1. The attributes should be threated as measure so I dont see the need to transform them to a measure. I think we should eliminate measures because we could have duplications for example if we do ds_r:= ds1.Att1+ds1.Att2 the measures of ds1 will be repeated. The resulting dataset of ds1.Att1 should be a dataset having the same identifiers and only the attribute Att1 (from my point of view). Then, if we want to use the new attribute with the dataset we only need to join them again ds_r:= [ds1, ds1.Att1+ds1.Att2]
  2. this we solved in Rome, if I remember well the answer should be yes
  3. this we solved in Rome, if I remember well the viral attributes will be propagated following a user defined rule while the other attributes will be removed
  4. see answer above
capacma commented 7 years ago

For the point 1. raised by @vignola if we decide that the membership operator applied to a dimension creates a new measure (containing the value of the identifier) I think the following expression is correct: ds1.SEX="M" because the left side ds1.SEX has only one measure and can be combined with any dataset having 1 measure (like "M" that is a scalar value). In addition, the measure to be created is not visible in the result of ds1.SEX="M", therefore the name that we choose for the measure is less relevant here.

linardian commented 11 months ago

Refers to old version of documentation