julie-forman-kay-lab / IDPConformerGenerator

Build conformational representations of Intrinsically Disordered Proteins and Regions by a guided sampling of the protein torsion space
https://idpconformergenerator.readthedocs.io/
Apache License 2.0
19 stars 6 forks source link

Use different programs to calculate secondary structure #134

Open joaomcteixeira opened 3 years ago

joaomcteixeira commented 3 years ago

A user have asked:

I'm wondering if it is possible to incorporate a program like DSSP that also assigns polyproline type II helix? I've seen that there are some methods like DSSP-PPII, SEGNO, PROSS etc. and Perhaps there could be an option to be able to switch between programs like for example, DSSP or DSSP-PPII?

Yes it is, and the answer is straightforward.

idpconfgen requires a command to call an external application in order to predict secondary structure. That command has to be passed to the idpconfgen sscalc client. For example:

idpconfgen sscalc mkdssp ...

That will use mkdssp as an external program to calculate secondary structure. At the current state, idpconfgen can use any other program as long as it runs in the terminal, receives a PDB/mmCIF file as input, and outputs a parseable result. However, in order to accommodate new third party programs we need to implement the execution command and its parsing routines. idpconfgen is made modular, so this is easy.

https://github.com/julie-forman-kay-lab/IDPConformerGenerator/blob/2da78306456f3a49477bdde801414359d6e98e28/src/idpconfgen/cli_sscalc.py#L161

this is the function that runs the mkdssp command. The developer needs to create a new function to accommodate the new program. If you follow the code, you'll see that this function executes mkdssp and captures its output. Additional options will also require a dictionary of choices at the level of the CLI.

https://github.com/julie-forman-kay-lab/IDPConformerGenerator/blob/2da78306456f3a49477bdde801414359d6e98e28/src/idpconfgen/libs/libparse.py#L110

mkdssp_w_split can be further abstracted if needed.

Finally, this is the function that parses DSSP output:

https://github.com/julie-forman-kay-lab/IDPConformerGenerator/blob/2da78306456f3a49477bdde801414359d6e98e28/src/idpconfgen/libs/libparse.py#L140

In summary, to accommodate a new program, you need to develop a need function that controls the execution of that program via subprocess, captures the output, and parses the output returns a tuple of 3 elements:

It is important to note here that I use DSSP to decide where the PDBs need to be split. We need to split PDBs at breaks in the backbone, which are many in the database to avoid artifacts in the torsion angles. That is why after sscalc pdbs have the suffix seg# where # is a integer.