kosukeimai / fastLink

R package fastLink: Fast Probabilistic Record Linkage
260 stars 47 forks source link

Exact match on certain column #55

Open shamahutoto opened 3 years ago

shamahutoto commented 3 years ago

Hi, is there a way to make sure that one column is an exact match?

aalexandersson commented 3 years ago

Yes, block on it.

aalexandersson commented 3 years ago

@shamahutoto Since there are various types of blocking, I should have been more precise:

Exact blocking on a variable (column), for example gender, makes sure that the variable is an exact match.

It is useful to think of record linkage as a process. You do blocking before the actual record linkage. Typically you use the blockData() function for the blocking. Please provide an example if you still need help. The main Github page for fastLink https://github.com/kosukeimai/fastLink gives an example.

Disclaimer: I am a regular user, not a developer.

tedenamorado commented 2 years ago

Hi @shamahutoto,

As @aalexandersson mentioned, you can either block on a certain variable. Note that for all the variables that you pass to fastLink that are not listed in stringdist.match or on numeric.match, exact matching is used to compare values.

Hope this helps! If anything, let us know.

All my best,

TEd

itsmevictor commented 1 year ago

Two years later, but just to be sure @tedenamorado, this means that if I don't add individuals' birth dates in either stringdist.match or numeric.match the algorithm will only try matching individuals (from the two dataframes) that have the same date of birth?

In that sense, it is the same thing as doing an exact block on the date of birth and then running the algorithm on the result? Or did I miss something?