Breakdown of the sNS and fNS (SOI) alignments

rbdavid commented 3 months ago

Hi, I am currently trying to parse the steps for the sNS and fNS alignment calculations. Could you help outline the process?

What I've figured out:

else if mm_opt == 5 || mm_opt==6), call SOIalign. https://github.com/pylelab/USalign/blob/d9635b5b12bc4e39d59389ac6e8e26f1532f3326/USalign.cpp#L3419-L3425
SOIalign is called, within which a set of nested for-loops iterate over structures and chains in target and mobile lists. https://github.com/pylelab/USalign/blob/d9635b5b12bc4e39d59389ac6e8e26f1532f3326/USalign.cpp#L2320-L2597
For each chain: (1) its atom lines are pulled from the structure file (read_PDB_lines and read_PDB functions). (2) make_sec is called to assign residues to specific 2ndary structure types. (3) If closeK_opt >=3 (only happens for mm_opt==5, fNS alignment), calculate the nearest neighboring residues, otherwise, just assume the nearest neighbor residues are the two before and after in the linear sequence. (4) Also, if mm_opt==6 (sNS alignment), then a sec_bond variable is filled with some information. I'm not sure what this information represents in the grand scheme of things. Why is this only happening for the sNS alignment calculation and not the fNS calculation?
Then for each mobile and target pair: (1) I'm going to ignore the [if (se_opt)] (lines https://github.com/pylelab/USalign/blob/d9635b5b12bc4e39d59389ac6e8e26f1532f3326/USalign.cpp#L2476-L2506) since the se_opt variable is used to avoid doing any alignments, just calculate original structures' alignment (as I understand it). (2) SOIalign_main is called (https://github.com/pylelab/USalign/blob/c6d2dc95e7701ca05ccdc379ad5639bcc68788ac/SOIalign.h#L549-L959).

Inside of SOIalign_main:

Initial scoring parameters are assigned by parameter_set4search.
CPalign_main is called, which is commented as "initial alignment with sequence order dependent alignment". This seems counter intuitive, since fNS alignments are supposed to not use sequence-order-dependent methods. Maybe I'm not recognizing how mm_opt == 5 is feeding in different values for parameters to the CPalign_main function. Within the CPalign_main function: (1) two fTM-align via TMalign_main calls are made with an optional third round. (2) a final TMalign_main call is made that does a full TMalign calculation.

At this point, I'm hitting a wall with my outlining because I'm no longer able to follow how the input variables to these functions are effected within the function calls.

I appreciate any insights you can provide for me!

rbdavid commented 3 months ago

The intent for my outlining is that I want to dumb down* the C++ code associated with the fNS and/or sNS alignment methods to use those alignment methods in a python workflow. To do so, I'd like to use the python ctypes module to call the USAlign functions (really SOIalign_main or equivalent) within my python script rather than calling USalign via a subprocess call.

In this workflow, ideally, I don't need USalign to parse input PDB files nor write the alignment results to storage or standard out. Instead, the input to the SOIalign_main call would be pre-parsed data already stored in python variables and results could be passed directly into python variables within the script. As mentioned above, the current approach of calling USalign within a subprocess call results in a lot of storage IO. Also, using subprocess to run a bash command(s) isn't ideal compared to running the c++ code as functions within the python script.

*By saying "dumb down", I mean remove the extra features that have been developed for the full release of USalign code, kinda in the vein of "Worse is Better".

kad-ecoli commented 3 months ago

Here are the answer to some of your questions:

"(3) If closeK_opt >=3 (only happens for mm_opt==5, fNS alignment), calculate the nearest neighboring residues, otherwise, just assume the nearest neighbor residues are the two before and after in the linear sequence." If closeK_opt<=2, no neighboring residue is used for initial alignment construction.
"(4) Also, if mm_opt==6 (sNS alignment), then a sec_bond variable is filled with some information. I'm not sure what this information represents in the grand scheme of things. Why is this only happening for the sNS alignment calculation and not the fNS calculation?" sNS respects the sequential order within a secondary structure element while fNS does not. Therefore, the latter does not need to store sec_bond.
"CPalign_main is called, which is commented as 'initial alignment with sequence order dependent alignment' This seems counter intuitive, since fNS alignments are supposed to not use sequence-order-dependent methods." Yes, CPalign_main is sequential. Whether we are performing a sequential or a non-sequential alignment, we need an initial alignment/superimposition, after which we can iteratively refine the alignment/superimposition. CPalign_main is one of the several approaches to achieve a reasonable initial alignment. fNS final result does not follow sequential residue order, but the initial alignment may use sequential information.

rbdavid commented 3 months ago

@kad-ecoli Is there a function flow diagram for the USalign code?

rbdavid commented 3 months ago

Or is the algorithm for sNS and fNS methods detailed in a paper?

kad-ecoli commented 3 months ago

A unified approach to sequential and non-sequential structure alignment of proteins, RNAs, and DNAs

Chengxin Zhang, Anna Marie Pyle Iscience 25 (10), 2022

rbdavid commented 3 months ago

Ah, thank you! I had the pdf of this paper in my ref manager but the methods are detailed only in the online version.

In that paper, the fNS alignment methods section explicitly says "US-align2 fNS alignment starts with an initial alignment that is non-sequential." Based on your answer to my initial question, the CPalign_main call is used only to get an initial alignment that is a reasonable start; its just a zeroth alignment step that centers and rotates the mobile structure to somewhat align with the target structure. In this regard, how sensitive is the final fNS/sNS alignment to the initial alignment performed by CPalign_main? If a single fTMalign call was made instead of the full CPalign_main call, would that have a large effect on the results?

Also, is there a reference for the CPalign implementation?

rbdavid commented 2 months ago

@kad-ecoli I'm not sure if you saw my questions in the above message. But I also want to share a thread that I've started over on my fork of USalign that maps out the functional flow, as I understand it @ https://github.com/rbdavid/USalign/issues/1. Any insights you can provide are greatly appreciated! I appreciate your time you've already given me :)

pylelab / USalign

Breakdown of the sNS and fNS (SOI) alignments #28