sdmx-twg / vtl

This repository is used for maintaining the SDMX-VTL specification
11 stars 7 forks source link

join clauses #363

Closed capacma closed 1 year ago

capacma commented 7 years ago

Issue Description

get, join, clauses

Proposed Solution

Proposal by Luigi on get, join, clauses clauses.docx join.docx

capacma commented 7 years ago

Comments by Maurizio clauses.docx get.docx join.docx

capacma commented 7 years ago

Second version by Luigi join_20170911.docx

capacma commented 7 years ago
  1. An alias is an identifier (not a string), are not quoted by " " and are known at compile time (not at run-time). In addition, if the alias is quoted when it appears after "as" then it should be quoted also when it is used therefore I propose to never quote it.
  2. The same reasoning applies to the names of components, that are identifiers (from a syntactical point of view) even when they appear in a list [ ] , (not strings) they are not quoted by " " and are known at compile-time. Otherwise they should be quoted when they are used in an expression.
  3. It is urgent to clarify the syntaxtical rules of: names of VTL objects (datasets, rulesets, functions, variables etc.), alias, names of components and function arguments.
  4. "on" cannot be used to terminate the lsit of datasets because it is optional
capacma commented 7 years ago

A comment on the "calc" operator.

  1. the "calc" can be used to add an identifier component to a dataset (not only a measure or attribute)
  2. the user must have the possibility to specify not only the name of the component to be added to the dataset, but also the data type. If I write:

    ds [ calc attribute obs_status := "P" ]

then the data type of obs_status is "string". Now suppose that I want the type to be "obs_status", who do I do this? Maybe with a syntax like the following:

ds [ calc attribute obs_status obs_status := "P" ]

Note that this feature applies both to the calc in the join and the normal calc clause.

capacma commented 7 years ago

Changes discussed during the teleconference on 13.09.2017

join_20170913.docx

vignola commented 7 years ago

Maurizio, penso che questo che hai postato sia un vecchio documento di Luigi. Il nuovo è datato 19 Settembre Ciao Laura

----- Messaggio originale ----- Da: "Maurizio" notifications@github.com A: "vtl-sdmx-task-force/sdmx-vtl" sdmx-vtl@noreply.github.com Cc: "Laura Vignola" vignola@istat.it, "Assign" assign@noreply.github.com Inviato: Mercoledì, 20 settembre 2017 8:33:19 Oggetto: Re: [vtl-sdmx-task-force/sdmx-vtl] get join clauses (#363)

Doc by Luigi revised + comments by Maurizio join_20170913.docx

-- You are receiving this because you were assigned. Reply to this email directly or view it on GitHub: https://urlsand.esvalabs.com/?u=https%3A%2F%2Fgithub.com%2Fvtl-sdmx-task-force%2Fsdmx-vtl%2Fissues%2F363%23issuecomment-330758251&e=e7a274c6&h=2292e6bc&f=n&p=y

capacma commented 7 years ago

OK thanks Laura, the correct document is attached below Document revised by Luigi + comments by Maurizio join_20170919 LB MC.docx

capacma commented 7 years ago

Question by Laura: if I apply the join with only one Data Set I don't need to do any join. Could not be possible to omit the join type in case of one single dataset? Maurizio: In theory the join operators has 2 or more operand datasets (by definition). In VTL 1.1. the type of join was optional therefore we could apply the operator to 1 dataset (the syntax was "[ds ] { clauses }" ). But in the new syntax it is a bit strange to say "inner_join ( ds )" and I think that even "join(ds)" is not much better.

What we could do is to say that the join operators can have 2 or more operand dataset (thus avoiding 1 single operand) and allow the "apply" in the external clauses (external clause: not in a join operator). In this way we have all features.

vignola commented 7 years ago

Indeed, as Vincenzo said the last time, we need that the join operator is applied also to only one dataset because it allows to define the unary operators that are in the standard library. I also found a very old document provided by Luigi in which all the operator of the standatd library were defined using the join expressin. Below I report an example:

CREATE FUNCTION upper (Dataset<?+, MeasureComponent+> D) return Dataset<?+, MeasureComponent+> CREATE FUNCTION lower (Dataset<?+, MeasureComponent+> D) return Dataset<?+, MeasureComponent+> AS [$D]{ apply upper to D //or apply lower to D }

SO I think it is not good to restrict the join to only one operand. Another possible idea is to put the specification of the join not mandatory with the inner join as default (of course in case of only one operand, whatever the join is, the result is always the same input dataset.

capacma commented 7 years ago

Clarification about the calc operator (in a join): the component calculated by calc overrides (i.e. drop) automatically all components of the operand datasets with the same name. Example (d1, d2 have 1 numerical measure m1 and no attributes): inner_join ( d1, d2 calc m1 := d1#m1 + d2#m1 ) in this example m1 is eliminated automatically both from d1 and d2. Otherwise the user has to write explicitly the drop.

capacma commented 7 years ago

Clarification about the apply operator: In the example: apply d1 + d2 + d3 does VTL require that all measures of d1 are numeric does VTL require that all measures of d1 are also in d2 and d3 (i.e. d1, d2, d3 have the same measures) Note that for the "normal" operators we said that A + B raises an error when A and B have non-numerical measures.

bellomarini commented 7 years ago

@capacma , I agree with the clarification about the calc in a join. Maybe the "automatic alias removal" part should be improved.

In particular, now I wrote that the "alias#compname" is automatically renamed into "compname" and the user must take care of avoiding duplicate names.

I suggest that we add: "in the presence of conflicts between homonym component names, after the automatic alias removals, calculated components prevail on others." This would consistently give the behavior you mention, since would consist in an implicit drop.

capacma commented 7 years ago

Comments discussed on 20 September 2017

join_20170920 LB MC.docx

capacma commented 7 years ago

Document on the clauses by Luigi +comments by Maurizio: clauses_20171002.docx

Additional comment: what is the syntax to combine two or more clauses? the same agreed for the join?

capacma commented 7 years ago

On the pivot/unpivot: it would be useful to define elem_list as optional in the syntax: [pivot { elem_list } to dim , msr ] if elem_list is missing then VTL uses the values of dim in the ds

capacma commented 7 years ago

Further comment on the calc:

capacma commented 7 years ago

Further comment on calc/subspace:

capacma commented 7 years ago

Last version by Luigi clauses_20171011.docx

linardian commented 1 year ago

Notes on obsolete version of documentation