This PR adds a new feature to the blocking section in the config file. Previously, all of the blocking tables were joined by ANDs to form the final blocking condition, like
a.BPL = b.BPL AND a.SEX = b.SEX
Now there is a new or_group attribute available on the blocking tables which users can use to group some blocking tables together into OR groups. These OR groups are joined together by ORs, not ANDs. This is helpful especially for situations where there are multiple variables that may contain the same information:
(a.BPL1 = b.BPL1 OR a.BPL2 = b.BPL2) AND (a.SEX = b.SEX)
By default, every blocking table gets put into its own OR group, so that the blocking condition is the same as it would have been before this PR. The matching.link_step_match.extract_or_groups_from_blocking() function has the logic for determining the OR groups from the input configuration. It returns a list[list[str]], where each sublist is an OR group. The potential_matches.sql template file has changed slightly to allow blocking_columns to be the new list of lists instead of a flat list.
Thanks for the review, Colin. I was also a little surprised that it didn't allow for OR. I think since you can do ORs in comparisons after blocking, we haven't needed this till now.
Closes #137.
This PR adds a new feature to the blocking section in the config file. Previously, all of the blocking tables were joined by ANDs to form the final blocking condition, like
Now there is a new
or_group
attribute available on the blocking tables which users can use to group some blocking tables together into OR groups. These OR groups are joined together by ORs, not ANDs. This is helpful especially for situations where there are multiple variables that may contain the same information:By default, every blocking table gets put into its own OR group, so that the blocking condition is the same as it would have been before this PR. The
matching.link_step_match.extract_or_groups_from_blocking()
function has the logic for determining the OR groups from the input configuration. It returns alist[list[str]]
, where each sublist is an OR group. The potential_matches.sql template file has changed slightly to allowblocking_columns
to be the new list of lists instead of a flat list.