moj-analytical-services / splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
https://moj-analytical-services.github.io/splink/
MIT License
1.28k stars 146 forks source link

[FEAT] Merge m_u_parameters_chart() and match_weights_chart() #1442

Open NickCrews opened 1 year ago

NickCrews commented 1 year ago

I always find myself looking at both of them, and it would be nice if they were horizontally concatenated so each comparison and level lined up. This would make there be less boilerplate code needed for users, and a cleaner way to look at them.

Also, maybe a separate issue, but it would be great if the mouseover for each of these charts included the "number of comparisons" that falls into each level, not just the percentage (see https://github.com/moj-analytical-services/splink/discussions/1434#discussioncomment-6452421)

image image
RossKen commented 1 year ago

Hey @NickCrews - thanks for the suggestion. I am currently writing up some documentation for each of the charts and had a similar idea. Do you feel it is important to be able to see both charts at the same time? My initial thought was to have a wrapper function around match weights chart, m u chart and the parameter estimates chart with an option to toggle between the 3 of them. Do you think that would be sufficient from your perspective?

image

On your tooltips point, I agree that this would be helpful - I will open up a separate issue for it.

NickCrews commented 1 year ago

It looks like your screenshots are from the EM training session charts. I think those charts are adequate as they are, I just need to see how M and U are changing, but I don't care about match weights. I look at those charts to double check that training is happening correctly, but not as much for assessing the quality of the fit.

I am talking about the .m_u_parameters_chart(). Does that distinction make sense? Just want to make sure we're talking about the same thing.

I think you could make it a param to the functions to display only some of the charts, but I want the default to be to show everything, and I don't want to have to do any UI adjustments with my mouse.

RossKen commented 1 year ago

Hi, sorry yes that was intentional as I was thinking about having the 3 charts in one and toggling between them. However, after discussion internally and given your preference for non-click solution, I have gone off that idea now.

I think the conclusion we came to internally was to have an optional parameter to include m u parameters chart with match weights chart, as opposed to default behaviour. From my experience, most users only refer to the m u chart when the match weights are looking a bit unusual and they want more information, which makes an optional parameter feel like a good fit, but open to other arguments!

NickCrews commented 1 year ago

That sounds good. I would lean towards the default being include_mu=True, since that is what I personally will always be doing, and I don't feel the charts are particularly visually crowded, but I defer to you all with your more community feedback for that decision.

Once you have something working, maybe post some screenshots so we can get a better sense of how crowded the chart feels. Thanks!