sdmx-twg / vtl

This repository is used for maintaining the SDMX-VTL specification
11 stars 7 forks source link

Group by and partition by applying to measures #385

Open antonio-olleros opened 8 months ago

antonio-olleros commented 8 months ago

Issue Description

The VTL Manual states that the components in group by clause need to be identifiers. I guess that this is done because, by definition, those components are becoming identifiers after the aggregate, and it is necessary to make sure that there are no null values. But this leads to writing extra VTL difficult to justify to business users. A real example we did:

aggregated_multimeasure <-
    market_companies
        [calc 
            identifier EconomicSector := EconomicSector ,
            identifier Country := Country ]
        [aggr 
            avg_price := avg(LastRate),
            total_turnover :=  sum(Turnover),
            total_capitalisation  := sum(SecurityMarketCap),
            min_transactions := min(Transactions)
            group by TimePeriod, EconomicSector, Country];

If you see, the calc identifier is required, but does not add value and can be missleading to users.

Proposed Solution

It should be allowed to use measures in the group by clause, but the measures need to be non-nullable

linardian commented 7 months ago

This issue goes back to the "null" treatment in VTL. As said I would keep actual behaviour which gives robustness and unambiguity to the language. I think that combining calc+nvl would probably be a reasonable solution.

antonio-olleros commented 7 months ago

I agree that we need to ensure the integrity. But now that we have the null attribute, which can be propagated (we are already doing it, and I think we should add the specification on how it propagates), we can do a semantic analysis and know whether the measure we want to use is nullable or not. If it is, we should raise semantic (compile-time) error, otherwise it would be allowed