I have struggled to figure out how to simultaneously display intra-dataset edges and inter-dataset edges (and similarly for nodes, as they contain different information for deduping vs linking)
Even with a workaround to show and filter all edges within a cluster, the waterfall charts for those edges will depend on N+1 splink models for N datasets (N dedupes + 1 link). To facilitate this would require a new feature in this repo to accommodate multiple models in a cluster.
Agree this would be a very useful feature. Unfortunately it would require quite a significant re-write because it significantly increases the the variety of data that we would need to keep track of
I have struggled to figure out how to simultaneously display intra-dataset edges and inter-dataset edges (and similarly for nodes, as they contain different information for deduping vs linking)
Even with a workaround to show and filter all edges within a cluster, the waterfall charts for those edges will depend on N+1 splink models for N datasets (N dedupes + 1 link). To facilitate this would require a new feature in this repo to accommodate multiple models in a cluster.
The logic on an edge by edge basis would be:
models = {"dataset1": model1, "dataset2": model2, "link": model3}
source_dataset_l == source_dataset_r
-> usemodels[source_dataset_l]
source_dataset_l != source_dataset_r
-> usemodels["link"]