Open chlebowa opened 5 months ago
This is a byproduct of the incomplete specification of the CDISC datasets file and the implicit relationship inference.
The implicit relationship inference currently works:
This creates problem when the 2 datasets have additional shared primary keys, resulting in "unnecessary" duplicate rows when merging.
We could evolve the mechanism to:
This fix/enhancement would change the behavior of joining keys and we would need to communicate the change to our users. It's not a breaking change though.
Expand the cdisc_datasets.yaml
file to include joining keys for additional relationships between datasets that are different from parent.
I would advise against this as it would greatly expand on that file and its complexity
What happened?
Inference of joining keys by parent seems insufficient.
Consider a case where two child datasets are joined.
ADLB
andADRS
have the same primary keys,"STUDYID", "USUBJID", "PARAMCD", "AVISIT"
.Joining by default, i.e. using
intersect(names(x), names(y))
correctly uses all primary keys as joining keys. Extracting ajoin_key_set
fromdefault_cdisc_join_keys
results in a cartesian product.sessionInfo()
Relevant log output
No response
Code of Conduct
Contribution Guidelines
Security Policy