skrub-data / skrub

Prepping tables for machine learning
https://skrub-data.org/
BSD 3-Clause "New" or "Revised" License
1.23k stars 98 forks source link

Adding a "Discover" object for finding candidate tables that may be joined on the main table #1153

Open rcap107 opened 5 days ago

rcap107 commented 5 days ago

Hello, this will be a long one!

The objective of this PR is adapting part of the pipeline from this paper: https://arxiv.org/abs/2402.06282 (repo https://github.com/rcap107/retrieve-merge-predict).

Main points:

Some of the problems/things that should be considered for later:

Notes on the current implementation:

The current code version is barebones, but it runs. Mostly, I am having trouble with integrating the code properly.