prody / ProDy

A Python Package for Protein Dynamics Analysis
http://prody.csb.pitt.edu
Other
421 stars 154 forks source link

Pairwise multiple structure alignment #1511

Open jlefebre opened 2 years ago

jlefebre commented 2 years ago

Is it planned to include a selection of methods for pairwise multiple structural alignment to align a larger number of structures to each other, as e.g. matchmaker in ChimeraX?

jamesmkrieger commented 2 years ago

We already have combinatorial extension (CE) and Dali. There's also the option of using Biopython's pairwise sequence alignment for structural alignment. It's unlikely that we'll add many more. Which methods did you have in mind?

jlefebre commented 2 years ago

The available methods tend to be rather slow, at least from what I have experienced when using many rather heterogenous structures. What about MMLigner, MDAnalysis aligner or Theseus?

jamesmkrieger commented 2 years ago

I was also thinking CE alignment is very slow and I made a pull request to speed it up by only taking Calpha atoms (#1480). It's not merged yet as we need to make sure it actually works well.

I'm not aware of any of those other methods. A quick look suggests the MDAnalysis one doesn't actually do anything sophisticated enough, but I guess we need to look properly and test what it can do. MMLigner and Theseus could be a options via OpenCADD (https://opencadd.readthedocs.io/en/latest/tutorials/mmligner.html and https://opencadd.readthedocs.io/en/latest/tutorials/theseus.html).

The caveat though is that someone needs the time etc. to do these things.

jamesmkrieger commented 2 years ago

There's also the functions buildMSA (and its wrapper alignSequencesByChain), which was developed with clustal programs in mind but could be used with any external program with some small tweaks. Line 836 looks like it may be problematic and needs some modification anyway as it calls a variable clustalw that isn't defined instead of alignTool. It would probably also be necessary to provide options to the programs too.

At any rate any input MSA or mapping can be used for structural alignment in ProDy so as long as you can get one of those, you can use any method outside ProDy anyway.

jlefebre commented 2 years ago

I just tried the OpenCADD wrappers. The only caveat here is that everything is based on MDanalysis Universe objects, meaning Atomgroups defined in Prody cannot be directly fed into the alignment, without exporting and loading as Universe object. Addtionally, this seems to be only possible with mmtf format, as pdb files loaded as a Universe object do not contain models for some reason (?). The alignment tools will throw an error if models are not present, even if there is just one model (as for every Xray structures). It also seems like, they have no option for exporting the aligned file.

For my specific problem, it would be nice to work with prody objects, from parsing to processing to alignment, as I am using prody applications downstream anyways. Right now I load structures (about 200) that contain chains of related protein domain, split up the chains and now I want to align them to get pairwise rmsds for each pair. If there is time and a smart way to approach this, would be great. However, I will work on some workarounds and its not urgent.

jamesmkrieger commented 2 years ago

I meant that we'd integrate ProDy with the OpenCADD wrappers perhaps. It's definitely a nice idea and hopefully someone can do something about it soon, but it's definitely not urgent.

We do have lots of tools to help out with this though. For example, can you not use buildPDBEnsemble and PDBEnsemble.getRMSDs(pairwise=True)?

jlefebre commented 2 years ago

I see! Thanks a lot for your help.

I am just having trouble understanding the different purposes of the alignment functions in the Structure Comparison module. But I think in principle those could also do the job. E.g. just using alignChains iteratively could work, but might take very long due to the cealign being slow.

jamesmkrieger commented 2 years ago

It depends what you're trying to do. If you really just want pairs of structures and you want to handle multiple chains at the same time then alignChains is the way to go. If you only have one chain from each structure then you could go for mapChainOntoChain instead and skip the combineAtomMaps step.

buildPDBEnsemble is for when you have many structures that are related so yes, that may not be what you want.

Again apologies that cealign is so slow. You could try taking my fix and testing it. To do that you'd do the following:

git clone https://github.com/jamesmkrieger/ProDy.git@cealign_non_ca cd ProDy pip install -e .

You may want to make sure you provide protein-only selections to it in case it doesn't work if you include other entities.

jamesmkrieger commented 2 years ago

The cealign update is now merged. Also that git clone command wasn't quite right anyway. I tested something similar and it didn't like it. Anyhow, you can now just do a usual one like the following:

git clone https://github.com/prody/ProDy.git