hodcroftlab / covariants

Real-time updates and information about key SARS-CoV-2 variants, plus the scripts that generate this information.
https://covariants.org/
GNU Affero General Public License v3.0
316 stars 112 forks source link

Add more information on mouse-over to mutation badges #167

Open emmahodcroft opened 3 years ago

emmahodcroft commented 3 years ago

Add more information about mutations in 'side-sausage' and in text, on mouseover.

For example:

Not sure what kind of backend work this would require.... need to think it through more.

ivan-aksamentov commented 3 years ago

Right now, in JSON, each mutation is an object which is fed directly into the <AaMut /> or <NucMut/> badge component. It also can be simply a string, because these components accept strings as well.

In order to add tooltips, we could make each mutation into a more sophisticated object, where there will be fields like, isSynonymous, aaMut, nucMut or whatever is needed. Then the aaMut or nucMut will go to the badge component and the rest can be rendered in a tooltip. The synonymous and nonsynonymous arrays can probably be merged and grouping can be done using the isSynonymous flag.

Also, the more reach data format can allow for some more sophisticated grouping and sorting of badges, perhaps even user-configurable, if needed. The current grouping could be implemented using the isSynonymous flag.

Example clusters.json that I imagine:

{
  "build_name": "20A.EU1",
  "display_name": "20E (EU1)",
  "mutations": [
    {
      "gene": "HappyGene",
      "aaMut": "HappyGene:M123K",
      "nucMut": "A1234C",
      "isSynonymous": false,
      "someMoreInfo": "Hello",
      "yetAnotherInfo": ["one", "two", "three"]
    },
    {
      "gene": "SadGene",
      "aaMut": null,
      "nucMut": "G4567T",
      "someMoreInfo": "Cannot tell which AA mutation",
      "isSynonymous": true,
      "yetAnotherInfo": []
    }
  ]
}

That will be quite a bit of manual labor, because there are so many mutations now, so the balance between effort and usefulness of the feature needs to be carefully weighted.

If there's (1) repetitiveness in mutations across clusters, and (2) the same mutation will always have the same data no matter what cluster it is in, it makes sense to split mutations over to a separate file, for example mutations.json, have a unique key in each mutation object and then refer to mutations by this key in clusters. This will avoid duplication, make it less error-prone, and can cut on manual work as well.

Example:

clusters.json (This assumes that aa mutation string itself for an aa mutation, and a nuc mutation string for a nuc mutaion are the unique identifiers and no other mutation can have the same identifier. But identifiers can be anything - preferably numbers or strings) :

{
  "build_name": "20A.EU1",
  "display_name": "20E (EU1)",
  "mutationIds": [
    "HappyGene:M123K",
    "G4567T"
  ]
}

mutations.json (note the id fields) :

{
  "mutations": [
    {
      "id": "HappyGene:M123K",
      "gene": "HappyGene",
      "aaMut": "HappyGene:M123K",
      "nucMut": "A1234C",
      "isSynonymous": false,
      "someMoreInfo": "Hello",
      "yetAnotherInfo": ["one", "two", "three"]
    },
    {
      "id": "G4567T",
      "gene": "SadGene",
      "aaMut": null,
      "nucMut": "G4567T",
      "someMoreInfo": "Cannot tell which AA mutation",
      "isSynonymous": true,
      "yetAnotherInfo": []
    }
  ]
}
nskplta commented 3 years ago

Hi,

generally, (non)synonymous mutation distinction is not being taken into consideration enough. However, synonymous mutations may impact mRNA 3D folding, synthesis speed or splicing, therefore they should gain interest.

Thanks,

Belén.