ipums / hlink

Hierarchical record linkage at scale
Mozilla Public License 2.0
11 stars 2 forks source link

Add a new `multi_jaro_winkler_search` comparison feature #99

Closed riley-harper closed 11 months ago

riley-harper commented 11 months ago

This is a quite specialized comparison feature which significantly reduces the complexity of comparing multiple columns against one another with Jaro-Winkler while also checking for equality between other columns. For a more complete description of the comparison feature and how it works, please see the documentation added to the Sphinx docs in comparison_types.md in this PR.

One thing of note with this comparison feature is that it supports templating. It replaces the string {n} in templates provided with various integers to construct column names. I believe this is the first time we've added this to a comparison feature, but it makes this one significantly more convenient and flexible than it would be without it.