malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
13 stars 24 forks source link

Colors in GWSS plots #583

Open jonbrenas opened 2 months ago

jonbrenas commented 2 months ago

@alimanfoo mentioned that we now offer the possibility to plot the various WGSS analyses for whole chromosomes (and even for the whole genome) but that, because they are all drawn the same way, it is hard to see where chromosome arms change. The idea would be to have a different colour for each one.

jonbrenas commented 2 months ago

Looking at this, I have several ideas for small enhancements:

1) Allowing users more freedom with how the stats are represented by letting them select the markers, their colour(s), their sizes, ... This is fairly straightforward. 2) The above-mentioned use of different parameters for different contigs (or even maybe regions?) 3) The option to have more than one trace on the same plot (is that going to be readable though?). This might be a different function. 4) Maybe an option to plot the difference between two traces to see differential signals of selection. This would probably be a new function.

alimanfoo commented 1 month ago

Hi @jonbrenas, thanks for raising this. Some comments inline...

Looking at this, I have several ideas for small enhancements:

  1. Allowing users more freedom with how the stats are represented by letting them select the markers, their colour(s), their sizes, ... This is fairly straightforward.

This seems like a good idea. However, the parameters would need to account for point 2 below somehow, i.e., how does the user control what happens when multiple contigs are being shown.

  1. The above-mentioned use of different parameters for different contigs (or even maybe regions?)

I would suggest to focus this issue on being able to plot different colours for different contigs, and to have this be the default behaviour.

  1. The option to have more than one trace on the same plot (is that going to be readable though?). This might be a different function.

I'd suggest raising as a separate issue - it is a common requirement to want to make a figure with multiple GWSS tracks above each other.

  1. Maybe an option to plot the difference between two traces to see differential signals of selection. This would probably be a new function.

Eric Lucas has in the past used the difference of H12 between two cohorts. Suggest to raise as a separate issue.

jonbrenas commented 1 month ago

Thanks @alimanfoo . I will create the extra issues.

As for this one:

Looking at this, I have several ideas for small enhancements:

  1. Allowing users more freedom with how the stats are represented by letting them select the markers, their colour(s), their sizes, ... This is fairly straightforward.

This seems like a good idea. However, the parameters would need to account for point 2 below somehow, i.e., how does the user control what happens when multiple contigs are being shown.

  1. The above-mentioned use of different parameters for different contigs (or even maybe regions?)

I would suggest to focus this issue on being able to plot different colours for different contigs, and to have this be the default behaviour.

We might want to discuss what the new parameters should be. I have the code with a dict of {'contig': 'color'} which is fairly straightforward which lead to my wondering if {'region': 'color'} would not make as much if not more sense.

I have already pushed the code that does 1. in the current formalism (i.e., the user can select a line colour, size, ... for the whole plot) and it would probably make more sense to have the dict be {'region': {'option': 'value'}} to extend the functionality (but that becomes a behemoth of a data structure).

The other issue is that 'different contigs' is currently limited. There is no option for '23' and computing H12 on '23X' crashes Colab (by using all the RAM) even with a reasonable sample set (such as 'AG1000G-AO') and big windows (e.g., 10 000 bp).

alimanfoo commented 1 month ago

Hi @jonbrenas, cc @leehart,

I think what we're looking for here is just ability to have colors alternate between contigs whenever data from multiple contigs is plotted. I.e., conventional manhattan-style plots. I.e., something like this...

image

What about just having a color parameter which can be a single color (in which case all markers get plotted with the same color) or could also be a sequence of colors (i.e., a palette) in which case colors are used in turn for each consecutive contig. Default could be a sequence of two colors, which would then be used repeatingly to get alternating colors.