malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
13 stars 23 forks source link

Display normal copy number (2) differently from missing CNV call (NaN) in advanced diplotype clustering #559

Open alimanfoo opened 2 weeks ago

alimanfoo commented 2 weeks ago

Currently missing CNV HMM data due to high coverage variance is rendered in white in the CNV heatmap. Normal copy number of 2 is also rendered as white. This can be confusing and allow mistaken interpretation of the data.

E.g., here is a plot with the default value for cnv_max_coverage_variance:

image

Here's the same plot but with cnv_max_coverage_variance set to None, showing that lots of samples which appeared white previously were due to high coverage variance:

image

alimanfoo commented 2 weeks ago

Suggest to use the same grey that the plot_cnv_hmm_heatmap() function uses for missing HMM state.

sanjaynagi commented 2 weeks ago

In my plots there is a difference, although admittedly its very faint.

NaNs are white, whereas copy number of 2 is a VERY light grey. Perhaps we can increase this contrast and/or switch up the colours as you say.

I think NaNs are transparent - a solution online is to change the background colour of the plot, may give that a go - its not obvious how you assign a colour to NaN in plotlys colorscale

image

sanjaynagi commented 1 week ago

In bokeh CNV plots, NaN recoded to -1