Component expressions vs. scalar expressions

stratosn commented 7 years ago

reporter issue reference	document (UM/RM/EBNF)	page	line
BDI-18	UM	49	All
BDI-18	RM	All	All

Issue Description

Component expressions seem to be the same as scalar expressions and the explanations in the core and in the reference are incoherent.

Proposed Solution

Maybe only scalar expressions are needed, but should be explained better wrt to the clauses.

bellomarini commented 7 years ago

I suggest that we explicitly allow scalar expressions in clauses rather than component expressions, which are obsolete and should be removed from the manual. Component expressions are nothing but a specific way to use scalar expressions, where variables take the values row-by-row from the Dataset.

This is just matter of definitions and explanations, but would make the presentation less ambiguous and more compact.

antonio-olleros commented 9 months ago

I believe this comment is quite valid, although it would require a thorough revision of the specification.

Notably, I have always had issued understanding what should the type of the parameters be in the user defined operators. Imagine that you want to define a max user defined operator, like in the manual example:

define operator max1 (x integer, y integer)
    returns boolean is
        if x > y then x else y
end operator

Isn't this applicable at component level? Would this be equivalent to :

define operator max1 (x component, y component)
    returns component is
        if x > y then x else y
end operator

NicoLaval commented 9 months ago

I think g4 enable that, with scalar, component and ds as input and output

vpinna80 commented 9 months ago

Hi @antonio-olleros the difference is only in the nomenclature: component expressions refer to expressions inside clauses in the [], like aggr, filter and calc. However, in the define operator, a component parameter requires that the argument passed is a component of a dataset (not a scalar). This would've made sense when VTL had DDL statements; perhaps now the distinction is blurred, but it could come again if we decide to add more DDL statements to the language.

However, when you are working with components, you can do something like this:

define operator mov_avg_3(v component<number>, id identifier) returns component is
  avg(v over(order by id) data points between 3 preceding and 3 following)
end operator

This would not be allowed if v was a scalar (note that the current VTL grammar won't allow it - it leaves so much to be desired actually).

antonio-olleros commented 9 months ago

Well, yes, I think the full specification here should be reviewed, because it is not clear at all how things are expected to look like, and I think we all have different ideas how this should look like.

I would propose to leave it a simple as possible.

In my view (almost only an intuition, I should analyse it further) only datatypes, dataset and scalarset types should be allowed. I think the rest of the things don't add info, because it can be inferred from the body of the UDO or it is not used or required by anything (e.g., why would we need to know that id is an identifier?)

vpinna80 commented 9 months ago

Well, in statically-typed programming languages like C/C++ or Java you don't usually infer the type of the parameters of a function from its body, because then you could have interpretation mistakes or unexpected execution-time errors.

Unless we want VTL to become Python or R...

IMHO VTL should retain its strong type system in order to facilitate writing correct scripts with minimal surprises. Also consider that there are some use-cases where data is not considered, and only the structures are.

antonio-olleros commented 9 months ago

Sorry, I just deleted a comment that was generic, not for the functions... In general I agree, I think we should provide the data types, but I think we should leave in some cases the components types and always the dataset components and types open. There is no need to have them because we can check on compile-time the correctness, ensuring that there will be no execution-time erros (we do it currently). In any case, I propose to have a devoted meeting on udos. I have some other topics that I'm planning to add (but not finding the tipe).

But for me, a simple way to put this, would be, is this allowed?

define operator max1 (x integer, y integer)
    returns integer is
        if x > y then x else y
end operator;

A := B[calc max_result := max1(Me1, Me2)];

And I hope it is, because it works, and otherwise we would need to create one function at the level of scalars and another at the level of components doing exactly the same

vpinna80 commented 9 months ago

Of course it is, even "scalar" would be ok, i think that > can be applied to any kind of scalar. Perhaps it would've been better if UDO could be made generic, like define operator max1<T extends scalar> (x T, y T) returns T. Or, again, perhaps this is asking too much. I agree, when we have the grammar and documentation repositories ready for editing, we could discuss this further.

antonio-olleros commented 9 months ago

Yes, we should discuss further! I'm very happy that you agree that it is possible, and now, is this then possible (grammar-wise it is, and we are allowing it!)

define operator max1 (x component, y component)
    returns component is
        if x > y then x else y
end operator;

A := B[calc max_result := max1(Me1, Me2)];i

linardian commented 8 months ago

My opinion is that UDO should be treated exactly as the "standard" operators. So, if a UDO can be applied to scalar and/or components and/or to dataset there should be 3 different operators (or syintaxes) in order to distinguish between different datatypes. The "extension" mechanism used e.g. in Java IMHO does not fit with VTL, being NOT a programming language. But i think this topic could require a specif discussion, probably in the next VTL physical meeting, if you agree.

sdmx-twg / vtl

Component expressions vs. scalar expressions #51

Issue Description

Proposed Solution