protocaller / ProtoCaller

Full automation of relative protein-ligand binding free energy calculations in GROMACS
http://protocaller.readthedocs.io
GNU General Public License v3.0
43 stars 15 forks source link

ValueError: The number of FASTA sequences does not match the number of chains #18

Closed kexul closed 3 years ago

kexul commented 3 years ago

Hi, I tried to prepare the system for protein '4E4N' and encountered this error: ValueError: The number of FASTA sequences does not match the number of chains.

Here is the minimum code to reproduce that:

with Dir('SYSTEM', overwrite=True):
        # create a protein from its PDB code and the residue number of the ligand
        # we are going to use for mapping
        protein_proto = Protein('4e4n', ligand_ref='1201')

Any help is appreciated!

kexul commented 3 years ago

It seems like that the protein 4E4N is a dimer. I've checked the protocaller paper's supplement files, which have dimer '6DAV' as an example, but the code used to prepare the system shows that it remove the chain after download the protein from pdbbank. So the error should be related with the updated version of protocaller I think.

protein_B = Protein("6DAV", ligand_ref=ligands["reference"], workdir="6DAV_B")

for chain in protein_B.pdb_obj:
    for residue in chain:
        atoms_to_purge = [x for x in residue if x.altLoc == "A"]
        residue.purgeAtoms(atoms_to_purge, "discard")
protein_B.pdb_obj.writePDB()
msuruzhon commented 3 years ago

This was an issue of the FASTA file only containing one sequence for the two chains. I wrote a thin wrapper that detects when this is the case and copies the sequences. 4E4N works now in my tests. Feel free to reopen if there are problems with some other proteins, since I am not sure how general this solution is.

kexul commented 3 years ago

Hi @msuruzhon , here is another protein '2QJR' raised the same error.