Open shamahutoto opened 3 years ago
Disclaimer: I am a regular fastLink
user, not a developer.
Please give an example to make the issue easier to understand.
For example, this copy-pasted code will to subset to threshold match 0.85 and above:
matched_dfs <- getMatches(
dfA = dfA, dfB = dfB,
fl.out = matches.out, threshold.match = 0.85
)
I guess that you need to subset with blocking which is doable but more complicated. The developers are working on improving the blocking functionality.
Hi @shamahutoto,
As @aalexandersson mentions, one idea here would be to lower the matching threshold. By default fastLink only returns pairs of records with a matching probability larger than 0.85. However, you can lower that value to e.g., 0.001 and recover pairs with a matching probability larger than that value which will be a larger group than the one produced by the default value. However, I would not recommend going too low as you will get pairs of records with a value that is basically 0 and if the datasets you are matching are large, then the fastLink objects will be incredibly large.
If anything, let us know.
All my best,
Ted
Hi there,
I want to find items that aren't matched but were just under the threshold for matching with a group. Is there a way to do this?