m2ms / fragalysis-frontend

The React, Redux frontend built by webpack
Other
1 stars 1 forks source link

Align all canonical sites to first biological assembly #1369

Open phraenquex opened 8 months ago

phraenquex commented 8 months ago

@ConorFWild

The CHIKV upload flushed out a bug in the algorithm, as discussed here in Slack:

Bug: If something binds only once to somewhere specific to a chain not part of the reference biological assembly (the one defined in assemblies.yaml), then that generates a canonical site that is not located around that reference biological assembly.

Required: before creating a new canonical site, ensure its biological assembly is aligned to the reference biological assembly.

Am I right, that this will mean writing out explitly a set of reference coordinates for every canonical site - rather than simply pointing to the original crystallographic file? (Or maybe that isn't what happens anyway.)

ConorFWild commented 8 months ago

You are misunderstanding the references I think, as this will always happen, but yes you are correct that a canonical site will not be aligned to the reference assembly, but this is because a canonical site can contain conformer sites from multiple reference assemblies, and hence there is no unique reference assembly to choose!

As such the required feature is most likely not possible with the current algorithm design?

phraenquex commented 8 months ago

@ConorFWild says it might be lots of work - or not. Doesn't matter for purple.

phraenquex commented 7 months ago

@ConorFWild would this ticket merge with #1227 ?

phraenquex commented 7 months ago

(FYI @mwinokan, who was taking notes) @ConorFWild and @phraenquex discussed how to define equivalence, and that robust heuristics are impossible/unlikely. Frank said the answer is that the F/E must support easy curation/updating of assignments. The relevant ticket is #1389.

ConorFWild commented 7 months ago

@phraenquex So, I've more or less scoped out the extent of changes that will be required I think:

  1. "Neighbourhood"s, the model for the protein environment of atoms, will need to be updated to carry information on which assembly they came from explicitly (currently this is only carried as transforms which are hard to reliably map to xtalform assemblies)
  2. Code needs to be added to create the assembly transform hierarchy, which will define the alignment relationship between assemblies (sketch algorithm in image below under "hierarchy construction")
  3. A bunch of plumbing code to get the assembly transform hierarchy, and the associated structures, where it needs to be
  4. A mechanism for checkpointing not just neighbourhoods but entire reference structures, as these must be robust to future changes and are currently referenced only by name and not version
  5. Input -and- Output code changes to save the hierarchy, reference alignment checkpoints and new version of Neighbourhood model
  6. Changes to the alignment code to consume assembly hierarchies and neighbourhood assembly membership, and decide which to and then perform the additional alignment operations (again sketch algorithm in image under "reference alignment")
  7. Probably some additional plumbing on XCA side to move things around, generate metadata and upload

All in all I'd say it was most likely at least 2 weeks full time work, are realistically more like a month with the amount of time I actually spend on XCA

PXL_20240327_150703320

phraenquex commented 7 months ago

@ConorFWild this appears to address ticket #1227 too, correct?

ConorFWild commented 7 months ago

It is, in the sense this same procedure will need to be applied to artefact atoms/chains, so needs to be tracked for them too! (although that kind of comes "for free" with this change)

phraenquex commented 7 months ago

@ConorFWild one more thought from our chat:

It will be important that users can see both chain IDs for non-artefact atoms: the one in the original crystal structure; and the one of the corresponding reference assembly.

(This may need some further front-end/NGL work; but for now, be sure to propagate the names in the relevant yaml file.)

ConorFWild commented 7 months ago

Update - actually trying to implement this has made me realize we need to define a canonical embedding of the reference assemblies - i.e. once the reference hierarchy is defined it must then be realized i.e. global (limited to the overlapping chains) alignments must be performed to generate concrete atomic coordinates for each of the assemblies and their final relative positions, and then canonical/conformer sites and alignments are done locally to parts of this franken-assembly!

ConorFWild commented 5 months ago

Alright, the new files are:

Changes are on the branches:

Examples can be found here:

@tdudgeon @kaliif @phraenquex If there are no updates to the metadata necessary based on the new files, then only their presence is necessary for alignment. On the assembly_alignment branch XCA "works" with the new method, but whether the upload works I can't say?

phraenquex commented 5 months ago

@ConorFWild can you clarify/elaborate on your last sentence, pelase?

If there are no updates to the metadata necessary based on the new files, then only their presence is necessary for alignment.

What are you explaining here?

ConorFWild commented 5 months ago

@phraenquex Basically - these new files contain essential alignment state, new runs of the aligner will not work without them, and hence they should be saved somewhere by the uploader. They also contain interesting information we may one day want to serve to the frontend, so again they should be saved, albeit not necessarily parsed and added to tables (unless there is a table containing a file list).

However they do not change the form of any currently uploaded and parsed data, and hence there may actually be no work for @tdudgeon @kaliif , barring keeping track of the fact there are new files!

kaliif commented 5 months ago

If the loader doesn't need to look into the files, then there should be nothing to do with the target loader - all yaml files are already saved and included in LHS download.

tdudgeon commented 5 months ago

I tried this out on the CHIKV_Mac data and hit a bug:

2024-05-24 15:26:36.265 | WARNING  | ligand_neighbourhood_alignment.align_xmaps:read_xmap_from_mtz:633 - Trying DELFWT DELPHWT
Origin for xmap is now: [37.44  46.927 -3.769]
2024-05-24 15:26:36.471 | INFO     | ligand_neighbourhood_alignment.cli:_update:1451 - Writing to: data/lb32633-6/upload_1/aligned_files/CHIKV_MacB-x0692/CHIKV_MacB-x0692_D_304_1_CHIKV_MacB-x0692+D+304+1_event.ccp4
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/data/github/im/fragalysis_api/xchem-align/src/xchemalign/aligner.py", line 879, in <module>
    main()
  File "/data/github/im/fragalysis_api/xchem-align/src/xchemalign/aligner.py", line 866, in main
    a.run()
  File "/data/github/im/fragalysis_api/xchem-align/src/xchemalign/aligner.py", line 232, in run
    new_meta = self._perform_alignments(input_meta)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/github/im/fragalysis_api/xchem-align/src/xchemalign/aligner.py", line 465, in _perform_alignments
    updated_fs_model = _update(
                       ^^^^^^^^
  File "/data/github/im/fragalysis_api/xchem-align/venv_assembly_alignment/lib/python3.11/site-packages/ligand_neighbourhood_alignment/cli.py", line 1482, in _update
    __align_xmap(
  File "/data/github/im/fragalysis_api/xchem-align/venv_assembly_alignment/lib/python3.11/site-packages/ligand_neighbourhood_alignment/align_xmaps.py", line 582, in __align_xmap
    interpolation_range = _get_interpolation_range(neighbourhood, running_transform, reference_xmap)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/github/im/fragalysis_api/xchem-align/venv_assembly_alignment/lib/python3.11/site-packages/ligand_neighbourhood_alignment/align_xmaps.py", line 238, in _get_interpolation_range
    rglb, rgub = get_grid_bounds(tlb, tub, reference_xmap)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/github/im/fragalysis_api/xchem-align/venv_assembly_alignment/lib/python3.11/site-packages/ligand_neighbourhood_alignment/align_xmaps.py", line 76, in get_grid_bounds
    floor(xmap.nu * tlbf.x),
    ^^^^^^^^^^^^^^^^^^^^^^^
ValueError: cannot convert float NaN to integer

@ConorFWild any ideas what's wrong? I can provide data if needed, but it's huge.

ConorFWild commented 5 months ago

@tdudgeon I'm going to guess that what has happened is that one of the alignments has failed, and propagated a nonsense transform operator - can you link me the location where this was run - I can probably take a first stab at the problem by looking at the transform yamls

mwinokan commented 4 months ago

@ConorFWild what is the status of this ticket?

phraenquex commented 1 month ago

@ConorFWild please confirm that this was in fact done for green release.

Adding green tag so long.

mwinokan commented 1 month ago

@mwinokan to call @ConorFWild to confirm this has been implemented

mwinokan commented 2 weeks ago

All ribbons have looked good for uploads in August/September so concluding this has been merged