tseemann / any2fasta

Convert various sequence formats to FASTA
GNU General Public License v3.0
127 stars 17 forks source link

feature request any2fasta file.pdb #20

Open avilella opened 3 years ago

avilella commented 3 years ago

Hi,

For PDB files from protein structures, e.g. like those predicted by alphafold2, it would be great to have any2fasta work on PDB files.

Initial simple request (pseudocode using csvtk, of which you are also a fan!):

cat ranked_0.pdb | csvtk space2tab | csvtk cut -H -t -f 4,6 | csvtk uniq -H -t -f 2 | turn-3-letter-code-to-single-letter-code | stitch to single line of AAs

If more than one chain:

cat ranked_0.pdb | csvtk space2tab | csvtk cut -H -t -f 4,5,6 | csvtk uniq -H -t -f 2,3 | foreach chain; do turn-3-letter-code-to-single-letter-code | stitch to single line of AAs

I hope this is clear enough. Having this in any2fasta would add yet another conversion (here PDB) into FASTA available in a single repo.

Thanks in advance

tseemann commented 1 year ago

@avilella i will add this in the next release