Open phraenquex opened 10 months ago
My conversations with Jenke about including ASAP IDs in the download makes this ticket relevant again
In the legacy/v1 implementation, aliases could be defined in the metadata.csv
.
The ASAP ID's are in SoakDB so maybe we don't need to place the onus for those on the curators, can they just be extracted from SoakDB.
We still need to think about how the aliases are shown in the frontend (see #1322)
@phraenquex says the simplest solution may be for XCA to put the compound aliases (crystal_name, zinc_id, asap_id) into a new CSV file in extra_files. This will mean that the ASAP ID's will be in the download.
After XCA the file exists in extra_files and the curator can add/change the aliases before upload
The f/e work is separate, see #1412.
@tdudgeon to get the ASAP ID's from SoakDB (@Waztom to provide example) and create the csv in the extra_files output
@tdudgeon was unable to find the ASAP ID's. @Waztom please help Tim with this.
@tdudgeon are you able to proceed with generating the CSV using the existing zinc/compound IDs while you wait for the ASAP ID's from SoakDb? Can the XCA generated sites be included in the CSV too?
@phraenquex please provide @tdudgeon with the required properties
Columns to include:
XCA generates columns 1-4; Loader will append the remaining columns.
For upload 1, leave column 4 blank.
For upload 2+, carry over any alias columns from previous uploads.
@phraenquex SC vs. LC for conformation site, crystalform site, assembly, and crystalform - there's currently no separate short and long code for these. Do you mean the short name generated for a tag?
I meant the short name, the thing that goes before the dash.
Come to think of it, just put in a single column for each of those - it's anyway just for info.
@tdudgeon has implemented the following:
compounds_auto.csv
compounds_manual.csv
@kaliif the columns 5 and onwards from Frank's spec are no longer needed in the loader (as they are covered by the metadata.csv)
@kaliif please verify that any extra columns added to the summary csv will be loaded by the target loader, and also included in the metadata.csv generated for the download.
@kaliif is still working on this
@mwinokan what needs to happen on data load? The way it's implemented now is on download, the extra columns are added to the file.
@kaliif yes that will be sufficient.
@kaliif says this is in staging but testing has only been local so far.
Likely one of Ryan's targets will be first to test these manual alias files, once they've been uploaded we should test the download as well.
Ryan has been made aware and will test this with the new P3 data
@mwinokan to test with one of the targets on staging
After deleting Flavi_NS5_RdRp from staging and uploading with a customised compounds_manual.csv in the extra_files directory, I get an error on upload that suggests there is something wrong with the CIF's:
{
"started": true,
"finished": true,
"status": "FAILED",
"messages": [
"INFO: Created TargetLoader for '20240910_Flavi_RdRp_alias_test.tgz' proposal_ref='lb32627-66'",
"INFO: Decompressing '20240910_Flavi_RdRp_alias_test.tgz'",
"INFO: Decompressed '20240910_Flavi_RdRp_alias_test.tgz'",
"WARNING: Zika_NS5A_x0025_ground_state: file ligand_cif expected but not found in meta_aligner.yaml file",
"INFO: 49 Experiment objects processed, 0 created, 49 fetched from database",
"Failed to process '20240910_Flavi_RdRp_alias_test.tgz'",
"FAILED"
]
}
@kaliif can you see any reason why the reference CIF is now no longer accepted? I didn't change the tarball in any other way
Flavi_NS5_RdRp
20240910_Flavi_RdRp_alias_test.tgz
The upload now works, but @kaliif tells me that the loader is currently not processing compounds_auto.csv or compounds_manual.csv.
Marking as mint and reiterating the spec below (checked means implemented):
extra_files/compounds_auto.csv
with SoakDB compound aliasescompounds_auto.csv
file and creates extra_files/compounds_manual.csv
and modifies it as they see fitextra_files/compounds_manual.csv
and if present uses this to assign compounds aliasesmetadata.csv
in the downloadSpec has been updated above, and a spin-out ticket for the f/e and API changes #1540.
@tdudgeon adds that we will need to support combi-soaks for the compounds_auto.csv
.
@tdudgeon says we could add a ligand_name column to the generated CSV:
xtal | ligand_name | compound_code |
---|---|---|
CHIKV_MacB-x0270 | LIG | Z100643660 |
CHIKV_MacB-x0270 | LG1 | Z100643662 |
CHIKV_MacB-x0270 | LG2 | Z100643663 |
CHIKV_MacB-x0281 | LIG | Z1041785508 |
CHIKV_MacB-x0289 | LIG | Z104492884 |
The user can then duplicate this file (compounds_manual.csv
) and add columns, e.g.:
xtal | ligand_name | compound_code | enamine_id | ASAP_id | CDD_id |
---|---|---|---|---|---|
CHIKV_MacB-x0270 | LIG | Z100643660 | EN-121 | ASAP-12129384293 | CDD-23123313 |
CHIKV_MacB-x0270 | LG1 | Z100643662 | EN-122 | ASAP-12129384294 | CDD-23123314 |
CHIKV_MacB-x0270 | LG2 | Z100643663 | EN-123 | ASAP-12129384295 | CDD-23123315 |
CHIKV_MacB-x0281 | LIG | Z1041785508 | |||
CHIKV_MacB-x0289 | LIG | Z104492884 |
compound_code_update
that supersedes what comes from SoakDB and updates the default compound code alias (@kaliif please shout if it particularly difficult to implement)compounds_manual.csv
should be permitted, @kaliif throw an error.@tdudgeon has added the ligand_name column to the compounds_auto.csv and all the data is in the metadata.yaml
@tdudgeon says that the compounds_auto.csv is not used by the loader, but it serves a purpose in the final target download and to serve as a template for compounds_manual.csv
@phraenquex clarifies that there is no explicit format specification for the new aliases
@kaliif is starting to work on this ticket now
Also create a special case
compound_code_update
that supersedes what comes from SoakDB and updates the default compound code alias (@kaliif please shout if it particularly difficult to implement)
@mwinokan I don't understand this point, I'm afraid. I don't know what comes from SoakDB or what needs to be updated.
@tdudgeon has added the ligand_name
column.
@kaliif Regarding the compound_code_update
column:
compounds_manual.csv
the compound_code
column is unchanged and corresponds exactly to what is in SoakDBcompound_code_update
column, these values should be used to supersede the default alias.So a user might provide (for a given dataset/lig name):
compound_code
: from SoakDB and is what is currently shown in the f/ecompound_code_update
: column for user to override the SoakDB value, use as the default alias and served to the f/ee.g. in this compounds_manual.csv example the compound_code
for the first ligand is overridden by the compound_code_update
:
xtal | ligand_name | compound_code | compound_code_update | enamine_id | ASAP_id | CDD_id |
---|---|---|---|---|---|---|
CHIKV_MacB-x0270 | LIG | Z100643660 | Z100643661 | EN-121 | ASAP-12129384293 | CDD-23123313 |
CHIKV_MacB-x0270 | LG1 | Z100643662 | EN-122 | ASAP-12129384294 | CDD-23123314 | |
CHIKV_MacB-x0270 | LG2 | Z100643663 | EN-123 | ASAP-12129384295 | CDD-23123315 | |
CHIKV_MacB-x0281 | LIG | Z1041785508 | ||||
CHIKV_MacB-x0289 | LIG | Z104492884 |
@tdudgeon says that the lookup can be done using the xtal
and ligand_name
columns alone, the compound_code
column is there to match against what is in the YAML. And the loader should throw an error if the compound_code
column does not match what is expected. Throw a useful error that says exactly what doesn't match and that it should go into the compound_code_update
column instead.
Dev done, needs testing, not merged to staging yet
@mwinokan please test once the clusters are back
@kaliif As described in #1540 (comment) the upload A71EV2A_xca_staging_20241104_fake_aliases.tar.gz
fails on both Matej's stack and staging
@kaliif's latest backend has fixed the tarball issue (see comment in 1540).
@kaliif says the b/e part is in staging
Need a mechanism to upload compound aliases. Probably a CSV, indexed by smiles string.
Need a way to see aliases in LHS. Probably tooltip.
Need a way to find by alias. Likely, search button should search through all aliases
Need a way to switch which alias is displayed in LHS. E.g. "switch alias" modal from top of Hit Navigator