nhsbsa-data-analytics / personMatchR

Helper package for matching individuals across two datasets
Apache License 2.0
0 stars 0 forks source link

Include warning and restrictions on cross joins #3

Closed steven-buckley closed 2 years ago

steven-buckley commented 2 years ago

The cartesian/cross join should be a last chance option as this could produce huge datasets.

One option would be to only perform this if the user explicitly requests this via a parameter. Additionally some logic should be applied to provide a warning message, allowing an abort, if the join will produce an excessive number of records (testing suggested 100m+ was causing memory issues).

The cartesian/cross join would only be applied for any records not handled by the exact matching process.

steven-buckley commented 2 years ago

Cross join only performed if remaining records to match would produce <100m records via a cross join