Possibility to retrieve protein sequence directly

rdk / p2rank

P2Rank: Protein-ligand binding site prediction tool based on machine learning. Stand-alone command line program / Java library for predicting ligand binding pockets from protein structure.

https://rdk.github.io/p2rank/

MIT License

242 stars 34 forks source link

Possibility to retrieve protein sequence directly #6

Closed lorenzoFabbri closed 5 years ago

lorenzoFabbri commented 5 years ago

Hi,

I'm trying to use P2Rank inside my pipeline to retrieve the protein sequence of the putative binding site. I'd like to know if there is the possibility to get the protein sequence other than the list of residue IDs. I'm trying to use BioPython but the sequence I get from it is shorter than the one in the PDB file. Right now I'm retrieving the sequence from the PDB file using some code I wrote but it should be an output of the program, I guess. Thanks.

Lorenzo

rdk commented 5 years ago

Hi Lorenzo, Thank you for your interest in p2rank. Version 2.1-dev.1 (https://github.com/rdk/p2rank/releases/tag/2.1-dev.1) produces residues.csv file which contains a list of all residues. You can have a look if you are able to extract protein sequences from it in the format you need. If this doesn't help let me know.