Open Waztom opened 9 months ago
@Waztom The CompoundCode presumably does not yet exist in fragalysis, and would be one of the identifiers for a compound that could be loaded (each compound potentially having multiple identifiers). That's where we want to end up, but right now what is the solution? If CompoundCode is always going to be present we could modify XCA to push through that ID (assuming it is available from SoakDB or somewhere consistent, and we could treat is as a primary identifier for the compound e.g. just add it as an extra column, leaving the many to one aspect of multiple identifiers (which will require a new table) to be done at a later stage.
But even so this is a moderate amount of work, and we need to first know where CompoundCode comes from.
@tdudgeon CrystaName and CompoundCode are both columns in SoakDB. CompoundCode is populated by the person doing the initial experiment. The combination of the two as an alias for a compound minimises issues with duplication.
With Boris's latest changes, I see that information I asked for is already getting through to the frontend on staging see #1278
Multiple alias's is definitely one for the yellow release.
The compound code is now included in the data. It looks like this:
crystals:
Mpro-IBM0045:
reference: true
type: model_building
last_updated: '2022-08-08 16:11:00'
refinement_outcome: 5 - Deposition ready
compound_code: Z68337194
crystallographic_files:
xtal_pdb: {file: upload_1/crystallographic_files/Mpro-IBM0045/Mpro-IBM0045.pdb,
sha256: bf24e507365fc150732a5f5df09e771030630e44543c406a8c0dda4b7473290d}
xtal_mtz: {file: upload_1/crystallographic_files/Mpro-IBM0045/Mpro-IBM0045.mtz,
sha256: b5e9d5e729b4284e4ff43bfbaccc4d2cf05fbe1744ee1603a4048ae32bb2b459}
ligand_cif: {file: upload_1/crystallographic_files/Mpro-IBM0045/Mpro-IBM0045.cif,
sha256: 8bae84d77c55878f31ac3795720899da7bb52e14a77b36027da524502a7e6e11,
smiles: CN(Cc1ccc(Cl)c(Cl)c1)c1ccc(S(N)(=O)=O)cn1}
panddas_event_files:
- {file: upload_1/crystallographic_files/Mpro-IBM0045/Mpro-IBM0045_1_A_1101.ccp4,
sha256: abab5237ad7a44cf237df0a4fe9b8c6f30f18b545f7e8e3848424282a130a91f,
model: '1', chain: A, res: 1101, index: 1, bdc: 0.21}
status: new
@kaliif Will need to handle this extra field in the target loader and make it available through the API.
@tdudgeon is the field mandatory or optional (i.e. should the upload fail or just make a note of missing code)?
I'm assuming this goes to Compound model?
I think it should be optional, as we can't guarantee it's populated in soakdb, and it won't be there for non-Diamond data. I'm not sure how to model this (experiment or compound). I've asked @Waztom for guidance.
It should be handled as a property of the compound
Backend changes done, merged to staging
@tdudgeon and @phraenquex LHS compound naming spec:
@tdudgeon in order of development, please start with the conformer site naming, followed by the crystal form naming. Canonical site naming needs Frank's review at next Frag meeting.
Please also see #1278 for more info.
Backend changes done, merged to staging
Which endpoint serves this information? I know that I don't have latest target processed by latest XCA but I don't see field compound id/code etc anywhere.
EDIT: I'm using latest backend (downloaded this morning)
@boriskovar-m2ms it's api/compounds field compound_code. Staging is empty atm but here's an example: https://fragalysis-kalev-default.xchem-dev.diamond.ac.uk/api/compounds/
TARGET123-x0233 observation XXXX goes to:
code
: x0233A
x0233
- strip from crystal name. A
- cycle through the observations, assign A-Z, AA,..AZ, BA-BZ, .... AAA-AAZ, ...code
isn't unique for target, then simply prefix it with A-Z,AA-AZ.... longcode
- new field, populate with what's currently code
: TARGET-x0488_A_147_x1081+A+147/6
compound_code
- add to api, get from compounds.compound_code
1- <canon_sites.name>
1 - A71EV2A-x0152+A+201
1
- number from top-to-bottom1a - <canon_sites_conf.name>
1a - A71EV2A-x0152+A+201
1
- comes from parent canonical sitea
- number a-to-z cycle through F1 - <xtalform_sites.xtalform_site_id> <crystalform name from assemblies.yaml>
1a - A71EV2A-x0450/A/201 - small cell
F
- always F, for "form"1
- xtalform_sites.id
xtalform_sites
needs a field for the crystalform anme from assemblies.yaml
- F/E will need it at some point.A1 - assembly name from assemblies.yaml
A
- always A, for "assemblies"1
- enumerate, by order of appearance in assembles.yaml
Tag names should NEVER be overwritten
Asssembly details (name, etc) will be needed by F/E eventually - accommodate in api?
This supercedes #1319.
Also: short tags need to be 3 characters (currently 2).
TARGET123-x0233 observation XXXX goes to:
* `code`: `x0233A` * `x0233` - strip from crystal name. * `A` - cycle through the observations, assign A-Z, AA,..AZ, BA-BZ, .... AAA-AAZ, ...
@phraenquex cycle through refers to observations under this experiment or all experiments in the upload?
1 - xtalform_sites.id
id as in object id from the database?
@kaliif all observations of that (crystal + canonical site + compound). I think that's it - see the mockup above, everything you'd expect to show up when you click the orange "observations" button.
@kaliif by xtalform_sites.id
I meant the field in the api - it's a number... but good point, maybe the wrong numer.
What I meant was: enumerate all xtalforms in assemblies.yaml
, and use that number. Ideally store it, and serve it from the API too.
@phraenquex there's a tag category for crystalforms as well. How do I format these?
@kaliif, here's the spec.
F1a - <crystalform name from assemblies.yaml> - <xtalform_sites.xtalform_site_id>
F1a - small cell - A71EV2A-x0450/A/201
F
- always F, for "form"1
- enumerate all xtalforms in assemblies.yaml, and use that number. (Ideally store it, and serve it from the API too.)a
- enumerate a..z,aa...az,ba...bz,.... for all crystalformsites in the crystalformxtalform_sites
needs a field for the crystalform name from assemblies.yaml - F/E will need it at some point.
Crystalform spec: F1 - small cell
. The same definitions as for crystalform_sites.
@kaliif Frank asked me to pass on to you that the observation letter i.e. the a
of x0123a
should be lowercase to reduce confusion with PDB chain ID's.
@phraenquex so if I understand this correctly Conformer site
, Crystalfrom site
and Assembly
should be visible only on observations and these are the only tags that will be visible on observations. And for the compound we will display Canonical site
tag first, and then rest of the tags?
Yes correct. (Sorry, I only made it implicit on the figure.)
(Max has been mocking up a new layout, but that's definitely for the next release.)
@phraenquex and what to do with compounds with only one observation? In this case the submenu is not accessible, Should I display Conformer site
, Crystalfrom site
and Assembly
tags on the observation?
@boriskovar-m2ms now you treat it like the others: the submenu should remain accessible, you show a 1
, and that opens up the observations modal with that single observation.
(This matters because not all the observation information is observed in the main LHS list.)
@boriskovar-m2ms looks wonderful.
Do you have time for these tweaks? Red is bugs, green is nice-to-have (hopefully quick).
What's this target? I checked all the compounds in the A71EV2A target.
Second bug is that tag from Xtalforms
category should be moved to observations view?
@boriskovar-m2ms
Yes, the XtalForms tags should be in the Observations.
The 2D compounds have now stopped missing for me, after I had to reboot - so some browser fart.
What about the green crosses?
(Overall it's wonderful!)
That's layout stuff and can try but most likely on Thursday I'm afraid. The layout of this is quite fixed so it's usually not that easy as it might seem.
Makes sense. Something you can pass on to @matej-vavrek?
To move the tag to observation is easy. Will do it and publish new version.
@kaliif and @mwinokan suspect all the issues related to #1327 are rooted in this ticket.
Kalev is naming the LHS sets as prescribed here ie. as x0375a
instead of what Max is expecting ax0375a
. My concern is not so much the prepending of the a
to datasets, but rather how the RHS can link the ref_mols
in the RHS upload to single events as observed LHS datasets. People talking about these datasets will also not be able to differentiate without reminding people of the site info. @kaliif the issue is that the current naming does not differentiate between datasets belonging to different canonical sites eg.
@kaliif the LHS dataset short-codes need to be unique for a Target, can you please update the LHS dataset short-code names so that they are unique - please see above spec where the loader is missing the unique assignment of letters to observations :
code
: x0233A
x0233
- strip from crystal name. A
- cycle through the observations, assign A-Z, AA,..AZ, BA-BZ, .... AAA-AAZ, ...@Waztom if I follow the grouping scheme (canon sit and compound), I run into conflicts because observations from one dataset can belong to different canon sites. Is this expected?
@kaliif this is not expected and will need @ConorFWild to help look/confirm this. Could you please send Conor and me a list of the datasets that have multiple canonical sites - this will help Conor debug?
@Waztom @ConorFWild this is what I got when loading the latest data from Tim (XX01ZVNSB2):
frag=# select id, code, longcode, smiles, chain_id, canon_site_conf_id, cmpd_id, experiment_id, xtalform_site_id from viewer_siteobservation where code like 'mx0884a';
id | code | longcode | smiles | chain_id | canon_site_conf_id | cmpd_id | experiment_id | xtalform_site_id
-----+---------+-------------------------------------------------+-----------------------------+----------+--------------------+---------+---------------+------------------
204 | mx0884a | XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1 | C[C@@H](Oc1ccccc1C#N)C(N)=O | B | 42 | 177 | 177 | 41
203 | mx0884a | XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1 | C[C@H](Oc1ccccc1C#N)C(N)=O | B | 43 | 177 | 177 | 40
I disabled the uniqueness check, so there's a duplicate code
. But you can see why, they have the same experiment_id
(dataset) but different canon_site_conf_id
, 42, 43, which link to different canon_sites:
frag=# select id, name, canon_site_id, ref_site_observation_id from viewer_canonsiteconf;
id | name | canon_site_id | ref_site_observation_id
----+--------------------------+---------------+-------------------------
42 | XX01ZVNS2B-x0673+B+501+1 | 40 | 180
43 | XX01ZVNS2B-x0719+B+301+1 | 41 | 185
44 | XX01ZVNS2B-x0773+B+401+1 | 42 | 189
45 | XX01ZVNS2B-x0884+B+401+1 | 40 | 204
So when I run group by
, I get two separate groups, one short code gets created and when I get to the next one, it tries to generate the same code.
The dataset in question is as follows:
XX01ZVNS2B-x0884:
type: model_building
last_updated: '2023-12-01 19:15:00'
refinement_outcome: 5 - Deposition ready
compound_code: Z18769001
code_prefix: m
crystallographic_files:
xtal_pdb: {file: upload_1/crystallographic_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884.pdb,
sha256: 316c9bd57f4268b10c792d19ce277420097cb4d8778e8b5b5efbf5a298b735ef}
xtal_mtz: {file: upload_1/crystallographic_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884.mtz,
sha256: cf79c819958ac8ea3bb233894cd97095c7545a5185042333425ce7c460f8d403}
ligand_cif: {file: upload_1/crystallographic_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884.cif,
sha256: 079d3e28cbdfad1045ac825ad9df3713b6dc4d04c7a0f1060f309d853bd28562,
smiles: 'C[C@H](Oc1ccccc1C#N)C(N)=O'}
panddas_event_files:
- {file: upload_1/crystallographic_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_1_B_301.ccp4,
sha256: 81bbbce511c11b9080c404b9650e8c2d505236a03e7d1214b2931f3a39b11e45,
model: '1', chain: B, res: 301, index: 2, bdc: 0.18}
- {file: upload_1/crystallographic_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_1_B_401.ccp4,
sha256: a5704968b5f13829304829e1e68dc96c25a9736092308b6b329a368057becbcf,
model: '1', chain: B, res: 401, index: 1, bdc: 0.17}
status: new
assigned_xtalform: xtalform1
aligned_files:
B:
'301':
XX01ZVNS2B-x0719+B+301+1: {structure: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1.pdb,
event_map: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_event.ccp4,
sigmaa_map: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_sigmaa.ccp4,
diff_map: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_diff.ccp4,
pdb_apo: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_apo.pdb,
pdb_apo_solv: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_apo-solv.pdb,
pdb_apo_desolv: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_apo-desolv.pdb,
ligand_mol: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_ligand.mol,
ligand_pdb: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_301_XX01ZVNS2B-x0719+B+301+1_ligand.pdb,
ligand_smiles: 'C[C@H](Oc1ccccc1C#N)C(N)=O'}
'401':
XX01ZVNS2B-x0673+B+501+1: {structure: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1.pdb,
event_map: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_event.ccp4,
sigmaa_map: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_sigmaa.ccp4,
diff_map: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_diff.ccp4,
pdb_apo: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_apo.pdb,
pdb_apo_solv: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_apo-solv.pdb,
pdb_apo_desolv: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_apo-desolv.pdb,
ligand_mol: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_ligand.mol,
ligand_pdb: upload_1/aligned_files/XX01ZVNS2B-x0884/XX01ZVNS2B-x0884_B_401_XX01ZVNS2B-x0673+B+501+1_ligand.pdb,
ligand_smiles: 'C[C@@H](Oc1ccccc1C#N)C(N)=O'}
The same situation happens also with XX01ZVNS2B-x0773
and XX01ZVNS2B-x0673
The following conflicts from the A71EV2A upload:
A71EV2A-x0152
A71EV2A-x0202
A71EV2A-x0211
A71EV2A-x0229
A71EV2A-x0278
A71EV2A-x0351
A71EV2A-x0375
A71EV2A-x0451
A71EV2A-x0515
A71EV2A-x0528
A71EV2A-x0554
A71EV2A-x0719
A71EV2A-x0875
A71EV2A-x1068
A71EV2A-x1081
A71EV2A-x1145
A71EV2A-x1148
@kaliif thank you very much - Ryan confirmed that the dataset list for conflicts for A71EV2A all have multiple ligands. Should have caught this earlier....you will need to assign the short codes and assign an observation to the multiple ligands within a dataset. Please confirm if it looks like a XCA issue - I have not been staring at the XCA outputs used for the loader.
@kaliif, each ligand should get a separate observation.
(I think that addresses what you describe, I didn't have bandwidth to dissect it fully.)
Sent from tiny silly touch screen
From: Warren Thompson @.***> Sent: Wednesday, 14 February 2024 15:11 To: m2ms/fragalysis-frontend Cc: Frank von Delft; Mention Subject: Re: [m2ms/fragalysis-frontend] LHS compound naming spec (Issue #1277)
@kaliifhttps://github.com/kaliif thank you very much - Ryan confirmed that the dataset list for conflicts for A71EV2A all have multiple ligands. Should have caught this earlier....you will need to assign the short codes and assign an observation to the multiple ligands within a dataset. Please confirm if it looks like a XCA issue - I have not been staring at the XCA outputs used for the loader.
— Reply to this email directly, view it on GitHubhttps://github.com/m2ms/fragalysis-frontend/issues/1277#issuecomment-1943740550, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AH52BYJMAEHJXAN242AALEDYTSZXHAVCNFSM6AAAAABB6IR4YOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBTG42DANJVGA. You are receiving this because you were mentioned.Message ID: @.***>
@kaliif for Purple release, please update tag naming as per Frank's spec:
@Waztom @kaliif please (briefly update this ticket with what was changed.
@phraenquex grouping scheme now by experiment (dataset) instead of canon site (using canon site led to conflicts, details here: https://github.com/m2ms/fragalysis-frontend/issues/1277#issuecomment-1943530885)
Tag names changed to shorter versions
After lengthy explanation to dim questions: this looks righ.
@phraenquex this is an interim fix until we have specifications/framework in place for ticket #1234 (yellow release).
Data curation/tagging is going to be extremely difficult if done by SMILES.
@tdudgeon and @kaliif Daren's preference is to use "CrystalName:CompoundCode" from SoakDB eg. x0152_1A:Z31602870.
For the purple release, can you pass this SoakDB table entry to the frontend for @boriskovar-m2ms to populate the LHS?