Denormalising many-to-many relationships

Issue Description

Some times it is necessary to denormalise many to many relationships. This is especially relevant for cases of micro data that are aggregated into multidimensional data. At the moment it is not possible (or way too difficult) to achieve this with VTL.

For instance, consider AnaCredit, that has information on Loans and Legal persons (entities), having a many to many relationship. Therefore, there are three datasets as follows:

Instrument:	instrumentId	nominalAmount
1	200
2	300
3	100

Entities	entityId	sector
1	S11
2	S2
3	S121

InstrumentEntities	instrumentId	entityId	role
1	3	cred
1	1	deb
2	3	cred
2	2	cred
2	1	deb
3	3	cred
3	1	deb	50
3	2	deb	50

If we want to calculate an aggregate by sectors, for instance, we would need to join the data to get a table like the following:	instrumentId	creditorId	debtorId	nominalAmountInstrument	sectorCreditor	sectorDebtor
1	3	1	200	S121	S11
2	3	1	300	S121	S11
2	2	1	300	S2	S11
3	3	1	100	S121	S11	50
3	3	2	100	S121	S2	50

And, to get there, we need the following steps:

Create two datasets, CRED and DEB, by subspacing the relationships table by its role component and renaming. CRED has as identifiers instrumentId and creditorId, while DEB has as identifiers InstrumentId and debtorId.
Finally, CRED and DEB datasets should be joined. But that kind of join is currently not accepted, because it is not the case that the identifiers of one dataset are a subset of the other

Proposed Solution

We assume that the reason for not allowing a join when the identifiers of one dataset are not a subset of the identifiers of the other is because this could imply null values for identifiers (in the example above, because a loan does not have a debtor). It should be noted that with this type of join we would never get duplicate combinations of identifiers.

Then, a possible solution would be the possibility to add a new option for joins that allows joins when the identifiers of a dataset are not a subset of a dataset of the other, but makes it mandatory to use an nvl clause in the using for the non common identifiers.

The example above would be written:

CRED := EntityInstrument
   [sub role = 'cred']
   [rename entityId to creditorId];
DEB := EntityInstrument
   [sub role = 'deb']
   [rename entityId to debtorId];
result <-
   inner_join(
      CRED,
      DEB
     using InstrumentId, nvl(creditorId, "N/A"), nvl(debtorId, "N/A")
   );

sdmx-twg / vtl

Denormalising many-to-many relationships #392

Issue Description

Proposed Solution