kosukeimai / fastLink

R package fastLink: Fast Probabilistic Record Linkage
258 stars 46 forks source link

Question - Recommendation for geocoding #32

Open Weekend-Warrior opened 6 years ago

Weekend-Warrior commented 6 years ago

Hi!

I have a question about how you'd suggest implementing links using geocoded records? I've considered doing the latitude and longitude numeric comparisons or perhaps reweighting the posterior probability based on the geodesic distance. Just spit-balling. Thanks!

Stewart

tedenamorado commented 6 years ago

Hi Stewart,

Apologies for the late reply. Using the numeric comparison for geo-distances seems like more than a reasonable approach. Perhaps including a blocking stage where you only compare observations that are close to each other might be another approach. Creating the blocks might be tricky as most likely they will overlap.

New functions to incorporate blocking (including cases with overlapping blocks) will be pushed really soon. You will be notified when that happens.

If anything please let us know.

All my best,

Ted

Weekend-Warrior commented 6 years ago

I'm very eager to see the blocking structures. Thanks for your support!

Weekend-Warrior commented 6 years ago

Hey guys,

Thanks for merging blockData this week! Window blocking seems to make sense for lat/lon blocked comparisons. Would you be willing to suggest a modified workflow for the step-by-step process in this scenario? In particular, how best to consolidate the EM tables and reconcile the best matches among the windows. Whatever you can offer will be very helpful and I thank you in advance!

Best regards,

Stewart

tedenamorado commented 6 years ago

Hi Stewart,

We are working on a function that takes lat/lon information and calculates distances using both variables. I think that would be the best approach to work with such data. I will let you know as soon as we pushed the function.

All my best,

Ted