moj-analytical-services / splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
https://moj-analytical-services.github.io/splink/
MIT License
1.38k stars 150 forks source link

[FEAT] Add number of comparisons to match weights chart #1529

Open RossKen opened 1 year ago

RossKen commented 1 year ago

Is your proposal related to a problem?

Linked to #1442 and #1434

It would be helpful to have a quick reference to the number of pairwise comparisons considered in each comparison level in the match weights chart. This could be particularly useful when dealing with model training issues as a result of small samples.

Describe the solution you'd like

Add the count and percentage of pairwise comparisons considered by each comparison level.

Describe alternatives you've considered

Additional context

samnlindsay commented 1 year ago

Potentially relevant image from this Slack discussion (including other recommendations for this PR)

In addition to the m, u and match weight charts, there used to be a "proportion of comparisons" chart showing how all comparisons were distributed among the levels. This shows where levels rarely or never appear, even if their model parameters appear sensible, or explaining why model parameters aren't sensible (#1434)

image