m2ms / fragalysis-frontend

The React, Redux frontend built by webpack
Other
1 stars 1 forks source link

LHS Tag and compound naming spec #1277

Open Waztom opened 9 months ago

Waztom commented 9 months ago

@phraenquex this is an interim fix until we have specifications/framework in place for ticket #1234 (yellow release).

Data curation/tagging is going to be extremely difficult if done by SMILES.

@tdudgeon and @kaliif Daren's preference is to use "CrystalName:CompoundCode" from SoakDB eg. x0152_1A:Z31602870.

For the purple release, can you pass this SoakDB table entry to the frontend for @boriskovar-m2ms to populate the LHS?

tdudgeon commented 9 months ago

@Waztom The CompoundCode presumably does not yet exist in fragalysis, and would be one of the identifiers for a compound that could be loaded (each compound potentially having multiple identifiers). That's where we want to end up, but right now what is the solution? If CompoundCode is always going to be present we could modify XCA to push through that ID (assuming it is available from SoakDB or somewhere consistent, and we could treat is as a primary identifier for the compound e.g. just add it as an extra column, leaving the many to one aspect of multiple identifiers (which will require a new table) to be done at a later stage.

But even so this is a moderate amount of work, and we need to first know where CompoundCode comes from.

Waztom commented 9 months ago

@tdudgeon CrystaName and CompoundCode are both columns in SoakDB. CompoundCode is populated by the person doing the initial experiment. The combination of the two as an alias for a compound minimises issues with duplication.

With Boris's latest changes, I see that information I asked for is already getting through to the frontend on staging see #1278

Multiple alias's is definitely one for the yellow release.

tdudgeon commented 9 months ago

The compound code is now included in the data. It looks like this:

crystals:
  Mpro-IBM0045:
    reference: true
    type: model_building
    last_updated: '2022-08-08 16:11:00'
    refinement_outcome: 5 - Deposition ready
    compound_code: Z68337194
    crystallographic_files:
      xtal_pdb: {file: upload_1/crystallographic_files/Mpro-IBM0045/Mpro-IBM0045.pdb,
        sha256: bf24e507365fc150732a5f5df09e771030630e44543c406a8c0dda4b7473290d}
      xtal_mtz: {file: upload_1/crystallographic_files/Mpro-IBM0045/Mpro-IBM0045.mtz,
        sha256: b5e9d5e729b4284e4ff43bfbaccc4d2cf05fbe1744ee1603a4048ae32bb2b459}
      ligand_cif: {file: upload_1/crystallographic_files/Mpro-IBM0045/Mpro-IBM0045.cif,
        sha256: 8bae84d77c55878f31ac3795720899da7bb52e14a77b36027da524502a7e6e11,
        smiles: CN(Cc1ccc(Cl)c(Cl)c1)c1ccc(S(N)(=O)=O)cn1}
      panddas_event_files:
      - {file: upload_1/crystallographic_files/Mpro-IBM0045/Mpro-IBM0045_1_A_1101.ccp4,
        sha256: abab5237ad7a44cf237df0a4fe9b8c6f30f18b545f7e8e3848424282a130a91f,
        model: '1', chain: A, res: 1101, index: 1, bdc: 0.21}
    status: new

@kaliif Will need to handle this extra field in the target loader and make it available through the API.

kaliif commented 9 months ago

@tdudgeon is the field mandatory or optional (i.e. should the upload fail or just make a note of missing code)?

I'm assuming this goes to Compound model?

tdudgeon commented 9 months ago

I think it should be optional, as we can't guarantee it's populated in soakdb, and it won't be there for non-Diamond data. I'm not sure how to model this (experiment or compound). I've asked @Waztom for guidance.

tdudgeon commented 9 months ago

It should be handled as a property of the compound

kaliif commented 9 months ago

Backend changes done, merged to staging

Waztom commented 9 months ago

@tdudgeon and @phraenquex LHS compound naming spec:

image

@tdudgeon in order of development, please start with the conformer site naming, followed by the crystal form naming. Canonical site naming needs Frank's review at next Frag meeting.

Waztom commented 9 months ago

Please also see #1278 for more info.

boriskovar-m2ms commented 9 months ago

Backend changes done, merged to staging

Which endpoint serves this information? I know that I don't have latest target processed by latest XCA but I don't see field compound id/code etc anywhere.

EDIT: I'm using latest backend (downloaded this morning)

kaliif commented 9 months ago

@boriskovar-m2ms it's api/compounds field compound_code. Staging is empty atm but here's an example: https://fragalysis-kalev-default.xchem-dev.diamond.ac.uk/api/compounds/

phraenquex commented 9 months ago

Naming by the loader (@kaliif):

Code for observations:

TARGET123-x0233 observation XXXX goes to:

String for tags:

Tag names should NEVER be overwritten

Asssembly details (name, etc) will be needed by F/E eventually - accommodate in api?

phraenquex commented 9 months ago

This supercedes #1319.

phraenquex commented 9 months ago

Rendering of LHS and observations sub-menu (@boriskovar-m2ms)

image

Also: short tags need to be 3 characters (currently 2).

kaliif commented 9 months ago

TARGET123-x0233 observation XXXX goes to:

* `code`: `x0233A`

  * `x0233` - strip from crystal name.
  * `A` - cycle through the observations, assign A-Z, AA,..AZ, BA-BZ, .... AAA-AAZ, ...

@phraenquex cycle through refers to observations under this experiment or all experiments in the upload?

1 - xtalform_sites.id

id as in object id from the database?

phraenquex commented 9 months ago

@kaliif all observations of that (crystal + canonical site + compound). I think that's it - see the mockup above, everything you'd expect to show up when you click the orange "observations" button.

phraenquex commented 9 months ago

@kaliif by xtalform_sites.id I meant the field in the api - it's a number... but good point, maybe the wrong numer.

What I meant was: enumerate all xtalforms in assemblies.yaml, and use that number. Ideally store it, and serve it from the API too.

kaliif commented 9 months ago

@phraenquex there's a tag category for crystalforms as well. How do I format these?

phraenquex commented 9 months ago

@kaliif, here's the spec.

xtalform_sites needs a field for the crystalform name from assemblies.yaml - F/E will need it at some point.

phraenquex commented 9 months ago

Crystalform spec: F1 - small cell. The same definitions as for crystalform_sites.

mwinokan commented 9 months ago

@kaliif Frank asked me to pass on to you that the observation letter i.e. the a of x0123a should be lowercase to reduce confusion with PDB chain ID's.

boriskovar-m2ms commented 9 months ago

@phraenquex so if I understand this correctly Conformer site, Crystalfrom site and Assembly should be visible only on observations and these are the only tags that will be visible on observations. And for the compound we will display Canonical site tag first, and then rest of the tags?

phraenquex commented 9 months ago

Yes correct. (Sorry, I only made it implicit on the figure.)

(Max has been mocking up a new layout, but that's definitely for the next release.)

boriskovar-m2ms commented 9 months ago

@phraenquex and what to do with compounds with only one observation? In this case the submenu is not accessible, Should I display Conformer site, Crystalfrom site and Assembly tags on the observation?

phraenquex commented 9 months ago

@boriskovar-m2ms now you treat it like the others: the submenu should remain accessible, you show a 1, and that opens up the observations modal with that single observation.

(This matters because not all the observation information is observed in the main LHS list.)

phraenquex commented 9 months ago

@boriskovar-m2ms looks wonderful.

Do you have time for these tweaks? Red is bugs, green is nice-to-have (hopefully quick).

image

boriskovar-m2ms commented 9 months ago

What's this target? I checked all the compounds in the A71EV2A target.

Second bug is that tag from Xtalforms category should be moved to observations view?

phraenquex commented 9 months ago

@boriskovar-m2ms

Yes, the XtalForms tags should be in the Observations.

The 2D compounds have now stopped missing for me, after I had to reboot - so some browser fart.

What about the green crosses?

(Overall it's wonderful!)

boriskovar-m2ms commented 9 months ago

That's layout stuff and can try but most likely on Thursday I'm afraid. The layout of this is quite fixed so it's usually not that easy as it might seem.

phraenquex commented 9 months ago

Makes sense. Something you can pass on to @matej-vavrek?

boriskovar-m2ms commented 9 months ago

To move the tag to observation is easy. Will do it and publish new version.

Waztom commented 9 months ago

@kaliif and @mwinokan suspect all the issues related to #1327 are rooted in this ticket.

Kalev is naming the LHS sets as prescribed here ie. as x0375a instead of what Max is expecting ax0375a. My concern is not so much the prepending of the a to datasets, but rather how the RHS can link the ref_mols in the RHS upload to single events as observed LHS datasets. People talking about these datasets will also not be able to differentiate without reminding people of the site info. @kaliif the issue is that the current naming does not differentiate between datasets belonging to different canonical sites eg.

image

@kaliif the LHS dataset short-codes need to be unique for a Target, can you please update the LHS dataset short-code names so that they are unique - please see above spec where the loader is missing the unique assignment of letters to observations :

kaliif commented 8 months ago

@Waztom if I follow the grouping scheme (canon sit and compound), I run into conflicts because observations from one dataset can belong to different canon sites. Is this expected?

Waztom commented 8 months ago

@kaliif this is not expected and will need @ConorFWild to help look/confirm this. Could you please send Conor and me a list of the datasets that have multiple canonical sites - this will help Conor debug?

kaliif commented 8 months ago

@Waztom @ConorFWild this is what I got when loading the latest data from Tim (XX01ZVNSB2):

frag=# select id, code, longcode, smiles, chain_id, canon_site_conf_id, cmpd_id, experiment_id, xtalform_site_id from viewer_siteobservation where code like 'mx0884a';
 id  |  code   |                    longcode                     |           smiles            | chain_id | canon_site_conf_id | cmpd_id | experiment_id | xtalform_site_id 
-----+---------+-------------------------------------------------+-----------------------------+----------+--------------------+---------+---------------+------------------
 204 | mx0884a | XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1 | C[C@@H](Oc1ccccc1C#N)C(N)=O | B        |                 42 |     177 |           177 |               41
 203 | mx0884a | XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1 | C[C@H](Oc1ccccc1C#N)C(N)=O  | B        |                 43 |     177 |           177 |               40

I disabled the uniqueness check, so there's a duplicate code. But you can see why, they have the same experiment_id (dataset) but different canon_site_conf_id, 42, 43, which link to different canon_sites:

frag=# select id, name, canon_site_id, ref_site_observation_id from viewer_canonsiteconf;
 id |           name           | canon_site_id | ref_site_observation_id 
----+--------------------------+---------------+-------------------------
 42 | XX01ZVNS2B-x0673+B+501+1 |            40 |                     180
 43 | XX01ZVNS2B-x0719+B+301+1 |            41 |                     185
 44 | XX01ZVNS2B-x0773+B+401+1 |            42 |                     189
 45 | XX01ZVNS2B-x0884+B+401+1 |            40 |                     204

So when I run group by, I get two separate groups, one short code gets created and when I get to the next one, it tries to generate the same code.

The dataset in question is as follows:

  XX01ZVNS2B-x0884:
    type: model_building
    last_updated: '2023-12-01 19:15:00'
    refinement_outcome: 5 - Deposition ready
    compound_code: Z18769001
    code_prefix: m
    crystallographic_files:
      xtal_pdb: {file: upload_1/crystallographic_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884.pdb,
        sha256: 316c9bd57f4268b10c792d19ce277420097cb4d8778e8b5b5efbf5a298b735ef}
      xtal_mtz: {file: upload_1/crystallographic_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884.mtz,
        sha256: cf79c819958ac8ea3bb233894cd97095c7545a5185042333425ce7c460f8d403}
      ligand_cif: {file: upload_1/crystallographic_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884.cif,
        sha256: 079d3e28cbdfad1045ac825ad9df3713b6dc4d04c7a0f1060f309d853bd28562,
        smiles: 'C[C@H](Oc1ccccc1C#N)C(N)=O'}
      panddas_event_files:
      - {file: upload_1/crystallographic_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_1_B_301.ccp4,
        sha256: 81bbbce511c11b9080c404b9650e8c2d505236a03e7d1214b2931f3a39b11e45,
        model: '1', chain: B, res: 301, index: 2, bdc: 0.18}
      - {file: upload_1/crystallographic_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_1_B_401.ccp4,
        sha256: a5704968b5f13829304829e1e68dc96c25a9736092308b6b329a368057becbcf,
        model: '1', chain: B, res: 401, index: 1, bdc: 0.17}
    status: new
    assigned_xtalform: xtalform1
    aligned_files:
      B:
        '301':
          XX01ZVNS2B-x0719+B+301+1: {structure: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1.pdb,
            event_map: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_event.ccp4,
            sigmaa_map: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_sigmaa.ccp4,
            diff_map: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_diff.ccp4,
            pdb_apo: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_apo.pdb,
            pdb_apo_solv: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_apo-solv.pdb,
            pdb_apo_desolv: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_apo-desolv.pdb,
            ligand_mol: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_ligand.mol,
            ligand_pdb: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_ligand.pdb,
            ligand_smiles: 'C[C@H](Oc1ccccc1C#N)C(N)=O'}
        '401':
          XX01ZVNS2B-x0673+B+501+1: {structure: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1.pdb,
            event_map: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_event.ccp4,
            sigmaa_map: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_sigmaa.ccp4,
            diff_map: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_diff.ccp4,
            pdb_apo: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_apo.pdb,
            pdb_apo_solv: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_apo-solv.pdb,
            pdb_apo_desolv: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_apo-desolv.pdb,
            ligand_mol: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_ligand.mol,
            ligand_pdb: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_ligand.pdb,
            ligand_smiles: 'C[C@@H](Oc1ccccc1C#N)C(N)=O'}

The same situation happens also with XX01ZVNS2B-x0773 and XX01ZVNS2B-x0673

kaliif commented 8 months ago

The following conflicts from the A71EV2A upload:

A71EV2A-x0152
A71EV2A-x0202
A71EV2A-x0211
A71EV2A-x0229
A71EV2A-x0278
A71EV2A-x0351
A71EV2A-x0375
A71EV2A-x0451
A71EV2A-x0515
A71EV2A-x0528
A71EV2A-x0554
A71EV2A-x0719
A71EV2A-x0875
A71EV2A-x1068
A71EV2A-x1081
A71EV2A-x1145
A71EV2A-x1148
Waztom commented 8 months ago

@kaliif thank you very much - Ryan confirmed that the dataset list for conflicts for A71EV2A all have multiple ligands. Should have caught this earlier....you will need to assign the short codes and assign an observation to the multiple ligands within a dataset. Please confirm if it looks like a XCA issue - I have not been staring at the XCA outputs used for the loader.

phraenquex commented 8 months ago

@kaliif, each ligand should get a separate observation.

(I think that addresses what you describe, I didn't have bandwidth to dissect it fully.)

Sent from tiny silly touch screen


From: Warren Thompson @.***> Sent: Wednesday, 14 February 2024 15:11 To: m2ms/fragalysis-frontend Cc: Frank von Delft; Mention Subject: Re: [m2ms/fragalysis-frontend] LHS compound naming spec (Issue #1277)

@kaliifhttps://github.com/kaliif thank you very much - Ryan confirmed that the dataset list for conflicts for A71EV2A all have multiple ligands. Should have caught this earlier....you will need to assign the short codes and assign an observation to the multiple ligands within a dataset. Please confirm if it looks like a XCA issue - I have not been staring at the XCA outputs used for the loader.

— Reply to this email directly, view it on GitHubhttps://github.com/m2ms/fragalysis-frontend/issues/1277#issuecomment-1943740550, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH52BYJMAEHJXAN242AALEDYTSZXHAVCNFSM6AAAAABB6IR4YOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBTG42DANJVGA. You are receiving this because you were mentioned.Message ID: @.***>

Waztom commented 8 months ago

@kaliif for Purple release, please update tag naming as per Frank's spec:

  1. Auto-naming Canonical sites long code: keep only the first two "plus" bits, without any pluses: A301 (instead of A71EV2A-x0211+A+301+1).
  2. Auto-naming Conformer sites long code: Keep only the string before the first plus: A71EV2A-x0211 (instead of A71EV2A-x0211+A+301+1).
phraenquex commented 8 months ago

@Waztom @kaliif please (briefly update this ticket with what was changed.

kaliif commented 8 months ago

@phraenquex grouping scheme now by experiment (dataset) instead of canon site (using canon site led to conflicts, details here: https://github.com/m2ms/fragalysis-frontend/issues/1277#issuecomment-1943530885)

Tag names changed to shorter versions

phraenquex commented 8 months ago

After lengthy explanation to dim questions: this looks righ.