skrub-data / skrub

Prepping tables for machine learning
https://skrub-data.org/
BSD 3-Clause "New" or "Revised" License
1.23k stars 99 forks source link

Enable setting the Joiner threshold in kilometers when joining on (latitude, longitude) columns #1045

Open jeromedockes opened 3 months ago

jeromedockes commented 3 months ago

Problem Description

when joining on coordinates ATM the quality of matches is assessed with euclidean distances between pairs of (lat, long) coordinates. it would be much easier and give better results if we could compute the geodesic distances and the user would say match airports to the closest weather station but only if they are less than 50 km away

Feature Description

not sure, maybe add a parameter to the joiner or adapt the existing ones or add a new joiner

Alternative Solutions

No response

Additional Context

No response

Shree7676 commented 1 month ago

I will start working on this issue.