ucl-pond / pySuStaIn

Subtype and Stage Inference (SuStaIn) algorithm with an example using simulated data.
MIT License
112 stars 62 forks source link

Adding a colourbar to PVD plots #32

Open sea-shunned opened 2 years ago

sea-shunned commented 2 years ago

This is minor in the grand scheme of things, but it might be important when creating clear figures for e.g. publication. I've toyed with a few ideas so thought I'd raise an issue on this before unilaterally merging one option for others to use and save them some time.

The Problem

Adding a colourbar to the PVD for the mixture version is straightforward, as colour intensity equates to the certainty of that position. For the z-score version, however, it has two dimensions. While certainty of the colour for a single z-score event equates to certainty (e.g. from pure white to pure red), the colours also mix when different z-score events overlap (and they mix proportionally to their certainty). For example, if a single stage (for a single biomarker) has 50% certainty for z=1 and z=2, this square will be a 50:50 mix of red and magenta (both of which are themselves at 50% intensity). A single colourbar cannot (to me, at least) capture this.

The question is, if adding a colourbar is to be useful, which information is it best that it captures.

Proposals

Here's a few variants I made for this. Other suggestions are welcome.

Simple Colourbar

Gets the point across, but doesn't integrate intensity/certainty or z-score mixing.

Intensity Colourbar

Highlights the difference in intensity, but not z-score mixing.

Mixing Colourbar

Highlights z-score mixing, but not intensity.


The point of this is to add something so others don't need to do this themselves. If no-one feels strongly, I'll just pick one after a week or so to integrate.

noxtoby commented 2 years ago

Tough call to pick one because they're each wrong in different ways, I think.

Simple isn't visually pleasing but is arguably the least wrong.

Intensity is wrong because it should (I think) peak at the z-score, then fade out to white again. And it doesn't handle z-score overlap, as you've said.

Mixing is arguably the most visually pleasing (especially if you added a fade-in from white and maybe a fade out to white) but will always be wrong in general, because the rate of blending will vary with the data — and will vary between biomarkers within a single model, potentially requiring a colour bar for each biomarker.

ayoung11 commented 2 years ago

What about something like the intensity colour bar but plotted separately for each z-score? So in this example three colour bars next to each other, labelled above with z=1, z=2 and z=3 and then each colour bar scale labelled as probability from 0 to 1.

As an aside, I was thinking at some point it would be nice to add PVDs for plotting the average z-score progression pattern. They'd take MCMC samples of stage_value from _calculate_likelihood_stage and plot the average z-score at each stage over the MCMC samples.

williamscotton commented 2 years ago

Personally think mixing the least wrong of the 3 options but agree with Neil having 0 as white blending up to Z=1 (similar to latex script on BrainPainter)

ayoung11 commented 2 years ago

I'm not keen on the mixing, I find it much less clear as it implies you're interpreting the value of the trajectory, when it's the uncertainty in the position. I think representing the mixing is less important than representing the uncertainty - the mixing only happens because the PVD is a 2d representation of the MCMC samples - the positions of the z-scores can't actually cross for an individual biomarker.

noxtoby commented 2 years ago

How about using the maths of the z-score model to generate the colour bar? (and even the PVD: replace the mixing)

"Something Something Gaussian linear etc. etc."

ayoung11 commented 2 years ago

I'm keen on that - one PVD for uncertainty in the positions and another for the average trajectory across samples would be a nice solution