m2ms / fragalysis-frontend

The React, Redux frontend built by webpack
Other
1 stars 2 forks source link

Load user-defined cpd aliase from XCA (compounds_auto.csv) #1262

Open phraenquex opened 10 months ago

phraenquex commented 10 months ago

Need a mechanism to upload compound aliases. Probably a CSV, indexed by smiles string.

Need a way to see aliases in LHS. Probably tooltip.

Need a way to find by alias. Likely, search button should search through all aliases

Need a way to switch which alias is displayed in LHS. E.g. "switch alias" modal from top of Hit Navigator

mwinokan commented 8 months ago

My conversations with Jenke about including ASAP IDs in the download makes this ticket relevant again

mwinokan commented 7 months ago

In the legacy/v1 implementation, aliases could be defined in the metadata.csv.

The ASAP ID's are in SoakDB so maybe we don't need to place the onus for those on the curators, can they just be extracted from SoakDB.

We still need to think about how the aliases are shown in the frontend (see #1322)

mwinokan commented 7 months ago

@phraenquex says the simplest solution may be for XCA to put the compound aliases (crystal_name, zinc_id, asap_id) into a new CSV file in extra_files. This will mean that the ASAP ID's will be in the download.

After XCA the file exists in extra_files and the curator can add/change the aliases before upload

The f/e work is separate, see #1412.

@tdudgeon to get the ASAP ID's from SoakDB (@Waztom to provide example) and create the csv in the extra_files output

mwinokan commented 7 months ago

@tdudgeon was unable to find the ASAP ID's. @Waztom please help Tim with this.

@tdudgeon are you able to proceed with generating the CSV using the existing zinc/compound IDs while you wait for the ASAP ID's from SoakDb? Can the XCA generated sites be included in the CSV too?

@phraenquex please provide @tdudgeon with the required properties

phraenquex commented 7 months ago

Columns to include:

XCA generates columns 1-4; Loader will append the remaining columns. For upload 1, leave column 4 blank.
For upload 2+, carry over any alias columns from previous uploads.

kaliif commented 7 months ago

@phraenquex SC vs. LC for conformation site, crystalform site, assembly, and crystalform - there's currently no separate short and long code for these. Do you mean the short name generated for a tag?

phraenquex commented 7 months ago

I meant the short name, the thing that goes before the dash.

Come to think of it, just put in a single column for each of those - it's anyway just for info.

mwinokan commented 7 months ago

@tdudgeon has implemented the following:

@kaliif the columns 5 and onwards from Frank's spec are no longer needed in the loader (as they are covered by the metadata.csv)

mwinokan commented 7 months ago

@kaliif please verify that any extra columns added to the summary csv will be loaded by the target loader, and also included in the metadata.csv generated for the download.

mwinokan commented 7 months ago

@kaliif is still working on this

kaliif commented 7 months ago

@mwinokan what needs to happen on data load? The way it's implemented now is on download, the extra columns are added to the file.

phraenquex commented 7 months ago

@kaliif yes that will be sufficient.

mwinokan commented 6 months ago

@kaliif says this is in staging but testing has only been local so far.

Likely one of Ryan's targets will be first to test these manual alias files, once they've been uploaded we should test the download as well.

mwinokan commented 3 months ago

Ryan has been made aware and will test this with the new P3 data

mwinokan commented 2 months ago

@mwinokan to test with one of the targets on staging

mwinokan commented 2 months ago

After deleting Flavi_NS5_RdRp from staging and uploading with a customised compounds_manual.csv in the extra_files directory, I get an error on upload that suggests there is something wrong with the CIF's:

{
    "started": true,
    "finished": true,
    "status": "FAILED",
    "messages": [
        "INFO: Created TargetLoader for '20240910_Flavi_RdRp_alias_test.tgz' proposal_ref='lb32627-66'",
        "INFO: Decompressing '20240910_Flavi_RdRp_alias_test.tgz'",
        "INFO: Decompressed '20240910_Flavi_RdRp_alias_test.tgz'",
        "WARNING: Zika_NS5A_x0025_ground_state: file ligand_cif expected but not found in meta_aligner.yaml file",
        "INFO: 49 Experiment objects processed, 0 created, 49 fetched from database",
        "Failed to process '20240910_Flavi_RdRp_alias_test.tgz'",
        "FAILED"
    ]
}

@kaliif can you see any reason why the reference CIF is now no longer accepted? I didn't change the tarball in any other way

mwinokan commented 2 months ago

Testing with Flavi_NS5_RdRp

  1. Deleted target from staging
  2. Uploaded 20240910_Flavi_RdRp_alias_test.tgz
  3. The upload fails. /viewer/task_status/d0621aa4-eaea-4a71-8e30-b663a4893ce2/
mwinokan commented 2 months ago

The upload now works, but @kaliif tells me that the loader is currently not processing compounds_auto.csv or compounds_manual.csv.

Marking as mint and reiterating the spec below (checked means implemented):

mwinokan commented 1 month ago

Spec has been updated above, and a spin-out ticket for the f/e and API changes #1540.

@tdudgeon adds that we will need to support combi-soaks for the compounds_auto.csv.

@tdudgeon says we could add a ligand_name column to the generated CSV:

xtal ligand_name compound_code
CHIKV_MacB-x0270 LIG Z100643660
CHIKV_MacB-x0270 LG1 Z100643662
CHIKV_MacB-x0270 LG2 Z100643663
CHIKV_MacB-x0281 LIG Z1041785508
CHIKV_MacB-x0289 LIG Z104492884

The user can then duplicate this file (compounds_manual.csv) and add columns, e.g.:

xtal ligand_name compound_code enamine_id ASAP_id CDD_id
CHIKV_MacB-x0270 LIG Z100643660 EN-121 ASAP-12129384293 CDD-23123313
CHIKV_MacB-x0270 LG1 Z100643662 EN-122 ASAP-12129384294 CDD-23123314
CHIKV_MacB-x0270 LG2 Z100643663 EN-123 ASAP-12129384295 CDD-23123315
CHIKV_MacB-x0281 LIG Z1041785508
CHIKV_MacB-x0289 LIG Z104492884
mwinokan commented 1 month ago

@tdudgeon has added the ligand_name column to the compounds_auto.csv and all the data is in the metadata.yaml

@tdudgeon says that the compounds_auto.csv is not used by the loader, but it serves a purpose in the final target download and to serve as a template for compounds_manual.csv

@phraenquex clarifies that there is no explicit format specification for the new aliases

mwinokan commented 1 month ago

@kaliif is starting to work on this ticket now

kaliif commented 1 month ago

Also create a special case compound_code_update that supersedes what comes from SoakDB and updates the default compound code alias (@kaliif please shout if it particularly difficult to implement)

@mwinokan I don't understand this point, I'm afraid. I don't know what comes from SoakDB or what needs to be updated.

mwinokan commented 1 month ago

@tdudgeon has added the ligand_name column.

@kaliif Regarding the compound_code_update column:

So a user might provide (for a given dataset/lig name):

e.g. in this compounds_manual.csv example the compound_code for the first ligand is overridden by the compound_code_update:

xtal ligand_name compound_code compound_code_update enamine_id ASAP_id CDD_id
CHIKV_MacB-x0270 LIG Z100643660 Z100643661 EN-121 ASAP-12129384293 CDD-23123313
CHIKV_MacB-x0270 LG1 Z100643662 EN-122 ASAP-12129384294 CDD-23123314
CHIKV_MacB-x0270 LG2 Z100643663 EN-123 ASAP-12129384295 CDD-23123315
CHIKV_MacB-x0281 LIG Z1041785508
CHIKV_MacB-x0289 LIG Z104492884

@tdudgeon says that the lookup can be done using the xtal and ligand_name columns alone, the compound_code column is there to match against what is in the YAML. And the loader should throw an error if the compound_code column does not match what is expected. Throw a useful error that says exactly what doesn't match and that it should go into the compound_code_update column instead.

kaliif commented 1 month ago

Dev done, needs testing, not merged to staging yet

phraenquex commented 1 month ago

@mwinokan please test once the clusters are back

mwinokan commented 2 weeks ago

b/e PR692 for staging fix implemented. @mwinokan to test

mwinokan commented 2 weeks ago

@kaliif As described in #1540 (comment) the upload A71EV2A_xca_staging_20241104_fake_aliases.tar.gz fails on both Matej's stack and staging

mwinokan commented 1 week ago

@kaliif's latest backend has fixed the tarball issue (see comment in 1540).

mwinokan commented 1 day ago

@kaliif says the b/e part is in staging