Open phraenquex opened 8 months ago
You are misunderstanding the references I think, as this will always happen, but yes you are correct that a canonical site will not be aligned to the reference assembly, but this is because a canonical site can contain conformer sites from multiple reference assemblies, and hence there is no unique reference assembly to choose!
As such the required feature is most likely not possible with the current algorithm design?
@ConorFWild says it might be lots of work - or not. Doesn't matter for purple.
@ConorFWild would this ticket merge with #1227 ?
(FYI @mwinokan, who was taking notes) @ConorFWild and @phraenquex discussed how to define equivalence, and that robust heuristics are impossible/unlikely. Frank said the answer is that the F/E must support easy curation/updating of assignments. The relevant ticket is #1389.
@phraenquex So, I've more or less scoped out the extent of changes that will be required I think:
All in all I'd say it was most likely at least 2 weeks full time work, are realistically more like a month with the amount of time I actually spend on XCA
@ConorFWild this appears to address ticket #1227 too, correct?
It is, in the sense this same procedure will need to be applied to artefact atoms/chains, so needs to be tracked for them too! (although that kind of comes "for free" with this change)
@ConorFWild one more thought from our chat:
It will be important that users can see both chain IDs for non-artefact atoms: the one in the original crystal structure; and the one of the corresponding reference assembly.
(This may need some further front-end/NGL work; but for now, be sure to propagate the names in the relevant yaml file.)
Update - actually trying to implement this has made me realize we need to define a canonical embedding of the reference assemblies - i.e. once the reference hierarchy is defined it must then be realized i.e. global (limited to the overlapping chains) alignments must be performed to generate concrete atomic coordinates for each of the assemblies and their final relative positions, and then canonical/conformer sites and alignments are done locally to parts of this franken-assembly!
Alright, the new files are:
Changes are on the branches:
Examples can be found here:
@tdudgeon @kaliif @phraenquex If there are no updates to the metadata necessary based on the new files, then only their presence is necessary for alignment. On the assembly_alignment branch XCA "works" with the new method, but whether the upload works I can't say?
@ConorFWild can you clarify/elaborate on your last sentence, pelase?
If there are no updates to the metadata necessary based on the new files, then only their presence is necessary for alignment.
What are you explaining here?
@phraenquex Basically - these new files contain essential alignment state, new runs of the aligner will not work without them, and hence they should be saved somewhere by the uploader. They also contain interesting information we may one day want to serve to the frontend, so again they should be saved, albeit not necessarily parsed and added to tables (unless there is a table containing a file list).
However they do not change the form of any currently uploaded and parsed data, and hence there may actually be no work for @tdudgeon @kaliif , barring keeping track of the fact there are new files!
If the loader doesn't need to look into the files, then there should be nothing to do with the target loader - all yaml files are already saved and included in LHS download.
I tried this out on the CHIKV_Mac data and hit a bug:
2024-05-24 15:26:36.265 | WARNING | ligand_neighbourhood_alignment.align_xmaps:read_xmap_from_mtz:633 - Trying DELFWT DELPHWT
Origin for xmap is now: [37.44 46.927 -3.769]
2024-05-24 15:26:36.471 | INFO | ligand_neighbourhood_alignment.cli:_update:1451 - Writing to: data/lb32633-6/upload_1/aligned_files/CHIKV_MacB-x0692/CHIKV_MacB-x0692_D_304_1_CHIKV_MacB-x0692+D+304+1_event.ccp4
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/data/github/im/fragalysis_api/xchem-align/src/xchemalign/aligner.py", line 879, in <module>
main()
File "/data/github/im/fragalysis_api/xchem-align/src/xchemalign/aligner.py", line 866, in main
a.run()
File "/data/github/im/fragalysis_api/xchem-align/src/xchemalign/aligner.py", line 232, in run
new_meta = self._perform_alignments(input_meta)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/github/im/fragalysis_api/xchem-align/src/xchemalign/aligner.py", line 465, in _perform_alignments
updated_fs_model = _update(
^^^^^^^^
File "/data/github/im/fragalysis_api/xchem-align/venv_assembly_alignment/lib/python3.11/site-packages/ligand_neighbourhood_alignment/cli.py", line 1482, in _update
__align_xmap(
File "/data/github/im/fragalysis_api/xchem-align/venv_assembly_alignment/lib/python3.11/site-packages/ligand_neighbourhood_alignment/align_xmaps.py", line 582, in __align_xmap
interpolation_range = _get_interpolation_range(neighbourhood, running_transform, reference_xmap)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/github/im/fragalysis_api/xchem-align/venv_assembly_alignment/lib/python3.11/site-packages/ligand_neighbourhood_alignment/align_xmaps.py", line 238, in _get_interpolation_range
rglb, rgub = get_grid_bounds(tlb, tub, reference_xmap)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/github/im/fragalysis_api/xchem-align/venv_assembly_alignment/lib/python3.11/site-packages/ligand_neighbourhood_alignment/align_xmaps.py", line 76, in get_grid_bounds
floor(xmap.nu * tlbf.x),
^^^^^^^^^^^^^^^^^^^^^^^
ValueError: cannot convert float NaN to integer
@ConorFWild any ideas what's wrong? I can provide data if needed, but it's huge.
@tdudgeon I'm going to guess that what has happened is that one of the alignments has failed, and propagated a nonsense transform operator - can you link me the location where this was run - I can probably take a first stab at the problem by looking at the transform yamls
@ConorFWild what is the status of this ticket?
@ConorFWild please confirm that this was in fact done for green release.
Adding green tag so long.
@mwinokan to call @ConorFWild to confirm this has been implemented
All ribbons have looked good for uploads in August/September so concluding this has been merged
@ConorFWild
The CHIKV upload flushed out a bug in the algorithm, as discussed here in Slack:
Bug: If something binds only once to somewhere specific to a chain not part of the reference biological assembly (the one defined in
assemblies.yaml
), then that generates a canonical site that is not located around that reference biological assembly.Required: before creating a new canonical site, ensure its biological assembly is aligned to the reference biological assembly.
Am I right, that this will mean writing out explitly a set of reference coordinates for every canonical site - rather than simply pointing to the original crystallographic file? (Or maybe that isn't what happens anyway.)