m2ms / fragalysis-frontend

The React, Redux frontend built by webpack
Other
1 stars 1 forks source link

Target annotations on landing page #1063

Open phraenquex opened 1 year ago

phraenquex commented 1 year ago

We have auto-generated content, including uniprot IDs, papers (dois?), etc.

Need:

Might use Discourse, or possibly something else.

matteoferla commented 1 year ago

Preliminary prototype w/ Blast —> Uniprot —> metadata using old Fragalysis API https://github.com/matteoferla/munged-Fragalysis-targets

boriskovar-m2ms commented 12 months ago

@phraenquex we can put button next to each target on the landing page and when you open it there will be links to additional content. This should ensure that the table is not too busy.

It should be populated from frontend? I think we will need also support from backend.

@matteoferla To be absolutely honest I have zero idea what I'm looking at.

phraenquex commented 12 months ago

@boriskovar-m2ms the information will need to be shown up-front in table form - hiding it behind a link will not be helpful.

Context: the target name is completely meaningless to the general viewer; so we need to show them additional information (gene name, organism, etc) to make it useful. And that information exists in other databases.

This is an ancient problem for websites interfacing to databases of gene and other biology things: the same thing can have many names, and (worse) you don't even know it's the same thing until someone does serious research on it, and (worst) the same thing can show up in different variations, triggering huge fights in that scientific sub-sub-sub-speciality about whether it is or is not the same.

It is this problem that Matteo's code touches on.

@matteoferla would you mind creating a precise mock-up of what you think that panel needs to look like, especially (!) what data need to go in which columns, and what bit of your code retrieves that data.

But @boriskovar-m2ms, it's a replicate of the Projects table (possibly an exact replicate). Just needs different data in the columns.

This is indeed a frontend/backend ticket

matteoferla commented 12 months ago

Summary: The Database would need 6 new character fields. The backend would need to pass these 6 the front end, along with the other fields. In terms of displaying I agree icons or an icon to a modal over table.

Detail

@phraenquex —I would argue that gene/protein synonyms is not a problem if uniprot IDs are stored. PDB wisely runs on those (see below). Discussion got bogged down in special case leading to plans getting deorbited. But the simplest implementation is fine for 99.99% of use cases.

Let's play devil's advocate and get a nasty case and treat it simple: XX01ZVNS2B.

The following can be generated once for the "primary" peptide (i.e. would need to be store in the backend)

  1. Chain letter
  2. Protein name as manually defined by a human to keep humans happy: NS2 protease
  3. Uniprot ID of "primary" peptide (Q32ZE1) — and link: https://www.uniprot.org/uniprotkb/Q32ZE1/entry
  4. Range of structure sequence in said Uniprot ID: 1499-1676
  5. Organism scientific name: Zika virus: https://www.uniprot.org/taxonomy/64320
  6. Protein domain family: PF00949 https://www.ebi.ac.uk/interpro/entry/pfam/PF00949/

The following is already stored and presented:

  1. Myserious lb18145-1 gibberish
  2. Is SGC target
  3. Link to discorse

Generated from Backend DB on each load:

  1. N hits
  2. Date last visit (Is the project dead)
  3. Data version number (I assume this is kept somewhere) —useful for computational folk

Info that could be useful, especially for analysis, but hard to enforce as a restricted dictionary of what is the protein. For NS3: "Protease", "enzyme", "peptide-binding". This would be useful for anyone using Fragalysis data to do analyses (eg. say were Anna doing her study today she would be able to classify the hits into peptide-binding pocket and nucleotide-binding pocket (cf. discussion PPI pockets yield lots of amide hits). But is non trivial to generate automatically, so might be a dont-go-there zone.

These 12 are info I would like. But I agree with @boriskovar-m2ms that a table would not fit. So 5 options:

Footnote

In terms of nastiness. The protein is a complex of NS3 and NS2B, which as it's viral is unlike most protein and made as a single polypeptide. The PDB already deals with these annotations. For example https://www.rcsb.org/structure/6l50 However, the metadata is not simple to handle. Hence why for the sake of sanity, just defining the primary polypetide and annotating that is by far easiest. Likewise, PDB does handle mutations and non-canonical amino acids (phosphoserine etc), but is complicated and actually one could figure it out...

matteoferla commented 12 months ago

@tdudgeon / @alanbchristie — In the above points 10–12, are they easily doable?

boriskovar-m2ms commented 12 months ago

@matteoferla if I understand this correctly 1-6 are user defined? @phraenquex Is it OK that anyone logged in is able to curate this data?

phraenquex commented 12 months ago

@matteoferla asked about the target name. @boriskovar-m2ms confirms it shows up EVERYWHERE, acting as a de facto database key. Including snapshot json and stuff. So currently it functions as a CODE, that doubles up as a name.

So we need an explicit separate target name field, that is easy to edit and curate.

Some reach-through: some frontend things might need the backend to dereference the CODE from the NAME, because users might not realise when they're differen.t E.g. the upload interface, etc. @alanbchristie can investigate.

matteoferla commented 12 months ago

Meeting points:

I will aim to get a CSV with data after dealing with OpenMM and completing my current documentation dive — 2 weeks?

phraenquex commented 11 months ago

Stuff copied from decprecated #1072:

Scope: make the left-most panel of the homepage more informative. Likely:

additional columns, with meaningful entries (date, creator, description field, links to other databases)
rework format - likely reuse changes made for right-side projects panel
links to other databases - see 

https://github.com/m2ms/fragalysis-frontend/issues/1063.

Might need some small backend-changes - likely additional fields in the target table.

phraenquex commented 11 months ago

@matteoferla we need the CSV more or less NOW - any chance you can do it today?

Even incomplete would be fine - @RoboMatuska cannot otherwise proceed.

phraenquex commented 11 months ago

Order of proceedings:

Priority for this ticket is the FRONTEND stuff - the backend we can populate later. As long as the API calls are extensible - please design accordingly.

The key USER problem to address now: people cannot find their targets. The tools for this exist in the Projects view, they need to come across in the Targets view. If this is resolved, we can move the rest to a new ticket.

matteoferla commented 11 months ago

@RoboMatuska — Apologies for the lateness of the file, but it is still technically today in British Summer time, just about.

Here is the simplest version of the data:

simple_values.csv

I will upload the data used to generate it for the sake of reference tomorrow.

Regarding the two entries that could be displayed as a link: if there was talk that clicking the whole row opens the Fragalysis page, please ignore these then.

Regarding the fact that it is "simple", the ideal and more correct solution is that most of the entries would be served not a simple value, but as a mapping of the chain ids to the values, eg. {'A': 'P12345', 'B': 'P92345'}, where you'd have the use the value in 'Primary chain' to get the value that is actually wanted. I have generated a dataset like this, but can only be passed as a JSON. And is a pointless complication, which I hope won't be needed.

phraenquex commented 11 months ago

The backend work might only be completed after the full v2 release (dark-purple). Probably by target loader.

matteoferla commented 11 months ago

For record-keeping. Notebook used to generate data: https://github.com/matteoferla/munged-Fragalysis-targets/blob/main/Fragalysis_targets.ipynb

phraenquex commented 11 months ago

The spec on this ticket is now required only for dark-purple release, as part of V2 makeover.

The preparatory front-end work is now specced out in #1161, which remains on the light-purple release.

Moving this ticket into ASAP swimlane, out of in-progress.

@RoboMatuska @boriskovar-m2ms please acknowledge.