nhsbsa-data-analytics / personMatchR

Helper package for matching individuals across two datasets
Apache License 2.0
0 stars 0 forks source link

Incorporate composite keys to limit joins #1

Closed steven-buckley closed 2 years ago

steven-buckley commented 2 years ago

Using full (cartesian) joins could easily explode the volume of matching taking place. To limit this a range of composite keys can be created based on various combinations of the matching fields.

A full composite may include the full name and postcode (JOHNSMITH20001225NE158NY) and would only join on corresponding records with the same composite (e.g. a full match).

A composite based on initials and postcode area (JSNE) would join to everyone with a matching composite in the other dataset.

The composites should complement the various potential confident match criteria. For example, it is possible to have a confident match whilst missing a postcode so we need to make sure the postcode is not include in all composite keys.