Closed rbdavid closed 3 months ago
The intent for my outlining is that I want to dumb down* the C++ code associated with the fNS and/or sNS alignment methods to use those alignment methods in a python workflow. To do so, I'd like to use the python ctypes
module to call the USAlign functions (really SOIalign_main or equivalent) within my python script rather than calling USalign via a subprocess
call.
In this workflow, ideally, I don't need USalign to parse input PDB files nor write the alignment results to storage or standard out. Instead, the input to the SOIalign_main call would be pre-parsed data already stored in python variables and results could be passed directly into python variables within the script. As mentioned above, the current approach of calling USalign within a subprocess
call results in a lot of storage IO. Also, using subprocess
to run a bash command(s) isn't ideal compared to running the c++ code as functions within the python script.
*By saying "dumb down", I mean remove the extra features that have been developed for the full release of USalign code, kinda in the vein of "Worse is Better".
Here are the answer to some of your questions:
@kad-ecoli Is there a function flow diagram for the USalign code?
Or is the algorithm for sNS and fNS methods detailed in a paper?
A unified approach to sequential and non-sequential structure alignment of proteins, RNAs, and DNAs
Chengxin Zhang, Anna Marie Pyle Iscience 25 (10), 2022
Ah, thank you! I had the pdf of this paper in my ref manager but the methods are detailed only in the online version.
In that paper, the fNS alignment methods section explicitly says "US-align2 fNS alignment starts with an initial alignment that is non-sequential." Based on your answer to my initial question, the CPalign_main
call is used only to get an initial alignment that is a reasonable start; its just a zeroth alignment step that centers and rotates the mobile structure to somewhat align with the target structure. In this regard, how sensitive is the final fNS/sNS alignment to the initial alignment performed by CPalign_main
? If a single fTMalign call was made instead of the full CPalign_main
call, would that have a large effect on the results?
Also, is there a reference for the CPalign implementation?
@kad-ecoli I'm not sure if you saw my questions in the above message. But I also want to share a thread that I've started over on my fork of USalign that maps out the functional flow, as I understand it @ https://github.com/rbdavid/USalign/issues/1. Any insights you can provide are greatly appreciated! I appreciate your time you've already given me :)
Hi, I am currently trying to parse the steps for the sNS and fNS alignment calculations. Could you help outline the process?
What I've figured out:
else if mm_opt == 5 || mm_opt==6)
, call SOIalign. https://github.com/pylelab/USalign/blob/d9635b5b12bc4e39d59389ac6e8e26f1532f3326/USalign.cpp#L3419-L3425SOIalign
is called, within which a set of nested for-loops iterate over structures and chains in target and mobile lists. https://github.com/pylelab/USalign/blob/d9635b5b12bc4e39d59389ac6e8e26f1532f3326/USalign.cpp#L2320-L2597read_PDB_lines
andread_PDB
functions). (2)make_sec
is called to assign residues to specific 2ndary structure types. (3) IfcloseK_opt >=3
(only happens for mm_opt==5, fNS alignment), calculate the nearest neighboring residues, otherwise, just assume the nearest neighbor residues are the two before and after in the linear sequence. (4) Also,if mm_opt==6
(sNS alignment), then asec_bond
variable is filled with some information. I'm not sure what this information represents in the grand scheme of things. Why is this only happening for the sNS alignment calculation and not the fNS calculation?if (se_opt)
] (lines https://github.com/pylelab/USalign/blob/d9635b5b12bc4e39d59389ac6e8e26f1532f3326/USalign.cpp#L2476-L2506) since these_opt
variable is used to avoid doing any alignments, just calculate original structures' alignment (as I understand it). (2)SOIalign_main
is called (https://github.com/pylelab/USalign/blob/c6d2dc95e7701ca05ccdc379ad5639bcc68788ac/SOIalign.h#L549-L959).Inside of
SOIalign_main
:parameter_set4search
.CPalign_main
is called, which is commented as "initial alignment with sequence order dependent alignment". This seems counter intuitive, since fNS alignments are supposed to not use sequence-order-dependent methods. Maybe I'm not recognizing howmm_opt == 5
is feeding in different values for parameters to theCPalign_main
function. Within theCPalign_main
function: (1) two fTM-align viaTMalign_main
calls are made with an optional third round. (2) a finalTMalign_main
call is made that does a full TMalign calculation.At this point, I'm hitting a wall with my outlining because I'm no longer able to follow how the input variables to these functions are effected within the function calls.
I appreciate any insights you can provide for me!