Closed marc-harary closed 4 years ago
The following line is incorrect:
mol=$(echo $mol | cut -d "-" -f 1)
It should be
mol=$(echo $mol | cut -d "-" -f 1,3|sed 's/-//g')
In other words, 1h3e-1-B
corresponds to mol=1h3eB
. Here, the chain ID B
cannot be omitted, because a single PDB could include multiple chains. For example, 1h3e
includes two chains, where chain A is a protein while chain B is an RNA.
Some of the cleaned .pdb files are blank except for
TER
keyword, producing blank FASTA files, as well. One such example is molecule 1h3e. Is this just because all of the pairs in the original file are non-canonical? Here's the script I wrote for reference.