moj-analytical-services / splink_cluster_studio

Create interactive dashboards to visualise and analyse the outputs of data linking
5 stars 0 forks source link

Add features to facilitate linked clusters based on multiple models (i.e. dedupe models + link only model) #3

Open samnlindsay opened 2 years ago

samnlindsay commented 2 years ago

I have struggled to figure out how to simultaneously display intra-dataset edges and inter-dataset edges (and similarly for nodes, as they contain different information for deduping vs linking)

image

Even with a workaround to show and filter all edges within a cluster, the waterfall charts for those edges will depend on N+1 splink models for N datasets (N dedupes + 1 link). To facilitate this would require a new feature in this repo to accommodate multiple models in a cluster.

The logic on an edge by edge basis would be:

RobinL commented 2 years ago

Agree this would be a very useful feature. Unfortunately it would require quite a significant re-write because it significantly increases the the variety of data that we would need to keep track of