sdmx-twg / vtl

This repository is used for maintaining the SDMX-VTL specification
11 stars 7 forks source link

Denormalising many-to-many relationships #392

Open antonio-olleros opened 8 months ago

antonio-olleros commented 8 months ago

Issue Description

Some times it is necessary to denormalise many to many relationships. This is especially relevant for cases of micro data that are aggregated into multidimensional data. At the moment it is not possible (or way too difficult) to achieve this with VTL.

For instance, consider AnaCredit, that has information on Loans and Legal persons (entities), having a many to many relationship. Therefore, there are three datasets as follows:

Instrument: instrumentId nominalAmount
1 200
2 300
3 100
Entities entityId sector
1 S11
2 S2
3 S121
InstrumentEntities instrumentId entityId role liabilityAmount
1 3 cred
1 1 deb
2 3 cred
2 2 cred
2 1 deb
3 3 cred
3 1 deb 50
3 2 deb 50
If we want to calculate an aggregate by sectors, for instance, we would need to join the data to get a table like the following: instrumentId creditorId debtorId nominalAmountInstrument sectorCreditor sectorDebtor liabilityAmount
1 3 1 200 S121 S11
2 3 1 300 S121 S11
2 2 1 300 S2 S11
3 3 1 100 S121 S11 50
3 3 2 100 S121 S2 50

And, to get there, we need the following steps:

Proposed Solution

We assume that the reason for not allowing a join when the identifiers of one dataset are not a subset of the identifiers of the other is because this could imply null values for identifiers (in the example above, because a loan does not have a debtor). It should be noted that with this type of join we would never get duplicate combinations of identifiers.

Then, a possible solution would be the possibility to add a new option for joins that allows joins when the identifiers of a dataset are not a subset of a dataset of the other, but makes it mandatory to use an nvl clause in the using for the non common identifiers.

The example above would be written:

CRED := EntityInstrument
   [sub role = 'cred']
   [rename entityId to creditorId];
DEB := EntityInstrument
   [sub role = 'deb']
   [rename entityId to debtorId];
result <-
   inner_join(
      CRED,
      DEB
     using InstrumentId, nvl(creditorId, "N/A"), nvl(debtorId, "N/A")
   );