selkamand / ggoncoplot

Easily Create Interactive Oncoplots
https://selkamand.github.io/ggoncoplot/
Other
3 stars 0 forks source link

Enforce fixed colour scheme across different calls #1

Closed selkamand closed 2 years ago

selkamand commented 2 years ago

Problem:

To maximise flexibility of ggoncoplot, we don't force the mutation types defined by col_mutation_type to align to any ontology. The end-user can use whatever mutation types they like. The problem with this is that this makes it difficult to automatically choose colours for these different mutation types in a manner thats consistent across different datasets.

Currently, we use an RColorBrewer palette and decide which colour is attached to each mutation type based on the frequency of the mutation types. To demonstrate why this is not ideal lets go through an example. Say you produce an oncoplot for two different cohorts, one of which is dominated by missense mutatons, the other by silent mutations. In one of these oncoplots missense mutations will be the same colour as silent mutations in the other. This would be extremely confusing.

potential solutions

  1. Force users to use some ontology for 'mutation_type'. Then we'll know all the possible mutation types in advanced and can make a single manual palette that maps each value to a colour consistently no matter what data is input. Major downside is the lack of choice for the end user. It may also be a lot of work for end-users to convert their mutation_type ontology to whatever we enforce. What ontology should be enforced? Should we try and guess at the mapping based on names of mutation_type? we might be able to provide mappings from one ontology to another to help users streamline data preprocessing

  2. We force users to define a mapping of mutation_types to colours. We make sure they have accounted for every value in their dataset. We could help with this by providing users with a basic example palette they should supply ggoncoplot. ggoncoplot would error unless user supplied this mapping.

  3. Both -- force an ontology UNLESS user supplies a palette mapping all mutation_types to colours. Best of both worlds

Each potential solution has its benefits and drawbacks. 1 is more work for the end-user but will make it easier to integrate ggoncoplot in shiny apps and pipelines. 2 is easier and more flexible for end-user, and allows domain-specific mutation_types to be used (e.g. there'd be the option to colour mutations based on germline/somatic origins in cancer data visualisation). 3 Is more work for me, and adds some complexity to the usage BUT with some careful info/warning messages sent to cli we could probably make this quite intuitive for end-users

Plan of attack

  1. Start implementing (1) as step 1. If I have time I'll work towards (3)
selkamand commented 2 years ago

Progress:

I've created the mutationtypes package. This will make it easier to check if user mutation classes are valid sequence ontology ('SO') or Mutation Annotation Format (MAF) terms.

If all terms align to one of these dictionaries, they can be coloured using palettes also found in this package.

Otherwise, will take a guess at what colours to use and warn user that theres no guarantee different plots will have different colour mappings, and suggest that the user supplies their own term -> colour mapping.

Nothing implemented in ggoncoplot yet