sdmx-twg / vtl

This repository is used for maintaining the SDMX-VTL specification
11 stars 7 forks source link

Division operator error #404

Closed amattioc closed 6 months ago

amattioc commented 7 months ago

Reference Manual

As per the RM, the division should always return a number (3183). Is this correct?

Anyway, in the examples, divisions of two integers seem to return integers (and the result is placed in the same integer column measure: example 1, Me_1 and example 2, Me_1).

antonio-olleros commented 7 months ago

I think that the examples may be misleading because the divisions presented could yield an integer. I think it would be clearer if at least one result had decimals.

In my opinion, the result has to be always number, and it should be placed in the same column (it is not an operator with changing data type behaviour)

amattioc commented 7 months ago

Hi Antonio,

I agree that the division should always return a number and if so, when it divides two integers, the result cannot be placed in the same column (as examples 1 and 2 seem to suggest). Me_1 is an integer in both ds1 and ds2, but the result of ds1/ds2 transforms Me_1 into a number and for this reason it should not be placed in the same column (or the column should be changed in the data type). Do you agree?

antonio-olleros commented 7 months ago

Well, I agree in theory, but I think we don't want that!! Note that this would imply that divisions could only be performed on monomeasure datasets, as it always happens when we have a change in data type (because if we change the data type, we have to change the variable (to num_var, for instance), and we can do that only for one variable).

But it is true that, at the moment, the model says that a variable name can only have one data type. So I think we have two options:

I strongly favour the second one, as I said during the Madrid meeting. I think that this is a very strong constraint, for abstract theoretical reasons, with:

egreising commented 7 months ago

Dear Antonio and All, I think you are right from a pragmatic point of view, but playing the Devil's advocate, what if you change the data type from double to date? That would mean a change in the semantics of the variable, and this is very bad from the data modelling perspective. I think the user should be aware of this fact and, if you are supposed to do an operation that requires a double to store the result, then the data type should be double from the begining, not integer. If somebody defined this measure as integer, it is not foreseen to store decimals. On the other hand, a number with decimals should not give an error when stored in an integer. It should truncate or round the number to an integer. ROUND and TRUNCATE should be parameters of the operation, with one of them chosen as a default.

antonio-olleros commented 7 months ago

Hi Edgardo, I think we should not mix the dictionaries/modelling part (to be left to SDMX or others) and the VTL. I completely understand, and agree with, the use case for dictionaries and data modelling, and I agree that it would be bad from data modelling perspective to have something like that.

But the issue here is different, because VTL is not meant to serve for modelling or as a data dictionary, but to validate and transform datasets.

For instance, the VTL has left out of its scope (thanks God!) the management of Agencies/Owners. Now, it is perfectly natural to use two datasets coming from two different agencies in one Transformation Scheme (e.g., A BIS dataset in dollars, and ECB's EXR dataset to convert to EUR). How can we ensure that the modellers in BIS and EXR have been consistent among them and have not used the same name for different things? We can't. And if we take 100% seriously the statement for VTL that one variable can only take one data type, then if both BIS and ECB have used the variable VAR_EXAMPLE, one as integer and the other as string, we would not be able to do anything with VTL for these two datasets together...

Another thing, most of the SDMX implementations I have seen in my life do not provide a representation at Concept level. Which in practice means that you can have any different representation/data type for the same variable in different Data Flows... So, if we implement that seriously in VTL (please not!), I think we would even be going against SDMX!

amattioc commented 7 months ago

Hi Antonio, I agree that VTL should not be involved in modelling, but we always have to keep an eye on the scenarios where it will be used. SDMX is not the only one, but it is an important one. The correct balance between robustness and flexibility is key for the success of this technology in the real world.

I was wondering if, for cases like the division, we could imagine to have different constraints for persistent and non persistent assignments. I can imagine that only the persistent ones would have possible conflicts with the "modelling world" (e.g. in SDMX with the fromVTL mappings). Non persistent assignments are probably related to the internals of a transformation and could have a more relaxed type management (e.g. including supertypes, treating codelists as strings if needed).

antonio-olleros commented 6 months ago

Hi!

I think that would be difficult to implement and understand. Also, you may want to create a TS taking as input the your dataset, but to generate a dataset from another institution, so the same name can be used with different meanings in input and output.

I think that the constraint of the names has no utility outside the modelling part, which I think should be addressed outside VTL. If you agree with that, then I think it is clear that we should drop it, because it only creates problems and does not add any advantage!

egreising commented 6 months ago

Hi All! I have some remarks on the past two or three comments.

linardian commented 6 months ago

Good morning, This is a very interesting discussion, and I think it is worth to put the general topic in the agenda of the next meeting in Salamanca. Very briefly this is my opinion, shared also with Attilio:

Sorry for the long comment, but I hope it will be of some help Division examples.docx

amattioc commented 6 months ago

I opened a new discussion (#409) to track this. In the meantime, for v2.1, I fixed the examples according to the current behaviour. This error seems to be related to most arithmetic operators that modify the type of the result with respect to the input (#406, #407).