Select chain from polymer

kexul commented 3 years ago

I would like to keep the chain with ligand bound to it, I have check the pdb file and know that's chain A, so I filtered everything else.

This is the code I use:

with Dir('test_case'):
        protein = Protein('6vn6', ligand_ref='601')
        for chain in protein.pdb_obj:
            for residual in chain:
                atoms_to_purge = [x for x in residual if x.altLoc != 'A']
                residual.purgeAtoms(atoms_to_purge, 'discard')
        protein.pdb_obj.writePDB()

which raised an error:

  File "/root/repo/prepare/teth_validate.py", line 184, in clear
    protein.pdb_obj.writePDB()
  File "/data/FEP/prepare/ProtoCaller/IO/PDB/__init__.py", line 131, in writePDB
    total_residue_list = self.totalResidueList()
  File "/data/FEP/prepare/ProtoCaller/IO/PDB/__init__.py", line 371, in totalResidueList
    total_res = self.missing_residues + self.filter("type in ['amino_acid', 'amino_acid_modified']")
  File "/data/FEP/prepare/ProtoCaller/IO/PDB/__init__.py", line 251, in filter
    add(residue)
  File "/data/FEP/prepare/ProtoCaller/IO/PDB/__init__.py", line 234, in add
    if eval("elem" + "." + condition, globals(), vars()):
  File "<string>", line 1, in <module>
  File "/data/FEP/prepare/ProtoCaller/IO/PDB/Residue.py", line 41, in __getattr__
    raise AttributeError("Invalid attribute: {}. Needs to be one of: {}".format(item, self._common_properties))
AttributeError: Invalid attribute: type. Needs to be one of: ['chainID', 'resName', 'resSeq', 'iCode']

When I use

atoms_to_purge = [x for x in residual if x.altLoc == 'B']

everything works fine.

In my case, I don't know which chains should be dropped but which chain should be kept.

If I got a tetramer, how to keep the specific chain which bounded by ligand?

Thanks !

kexul commented 3 years ago

By the way, in the example of DHFR, in the protein.filter() function, the chains parameter is always A for chain A and chain B, is that a typo?

print("Generating input files for: 6DAV with altLoc A")
# here we supply a custom ligand orientation as a reference
protein_A = Protein("6DAV", ligand_ref=ligands["reference"],
                    workdir="6DAV_A")
# delete any atoms with altLoc B
for chain in protein_A.pdb_obj:
    for residue in chain:
        atoms_to_purge = [x for x in residue if x.altLoc == "B"]
        residue.purgeAtoms(atoms_to_purge, "discard")
protein_A.pdb_obj.writePDB()

system_A = Ensemble(protein=protein_A, morphs=morphs,
                    box_length_complex=8, workdir="6DAV_A")
# here we only keep chain A and the crystal waters that belong to it
system_A.protein.filter(ligands=None, simple_anions=None,
                        simple_cations=None, waters="chains",
                        chains=["A"])
system_A.protein.prepare()
system_A.prepareComplexes()

# repeat but for altLoc B
print("Generating input files for: 6DAV with altLoc B")
protein_B = Protein("6DAV", ligand_ref=ligands["reference"],
                    workdir="6DAV_B")
for chain in protein_B.pdb_obj:
    for residue in chain:
        atoms_to_purge = [x for x in residue if x.altLoc == "A"]
        residue.purgeAtoms(atoms_to_purge, "discard")
protein_B.pdb_obj.writePDB()

system_B = Ensemble(protein=protein_B, morphs=morphs,
                    box_length_complex=8, workdir="6DAV_B")
system_B.protein.filter(ligands=None, simple_anions=None,
                        simple_cations=None, waters="chains",
                        chains=["A"])
system_B.protein.prepare()
system_B.prepareComplexes()

msuruzhon commented 3 years ago

it seems to me that you are confusing altLoc's with chainID's. AltLoc's are unresolved parts of the structure with several alternative sets of coordinates and they are specific to each PDB file; they might be also present or absent. In your PDB file you actually don't seem to have any non-standard altLoc's so when you attempt to remove anything that's not "A", you remove all Atoms and you end up creating empty residues, which crashes the software. What you want is simply done by this command here, which is a variation of the command you see in the example:

protein.filter(ligands="chain", waters="chain", chains=["A"])

I do have a typo in the example here, it should be "chain" and not "chains", I'll fix that.

In any case, your system appears to have a zinc ion, so you can't really use ProtoCaller to model it, as there is no support for metals. You'll have to do the parametrisation yourself, and I haven't tested how ProtoCaller deals with pre-parametrised protein systems that have metals in them, so I can't guarantee that it will be of any help.

kexul commented 3 years ago

Wow! Thanks for your explanation, I did confuse altLoc with chain ID. And thanks for your suggestions about metals, I'll keep that in mind.

kexul commented 3 years ago

Hi @msuruzhon , for metal ion, could we simply drop it using protein.filter() function if it's not a cofactor? i.e. has no effect on ligand binding.

There are two parameters in protein.filter() function about cations: simple_cations and complex_cations, I could not find the documentation about the difference of these two parameters, could you please explain it? Thank you!

msuruzhon commented 3 years ago

Hi @kexul, the residue names that are classified as simple_cations and complex_cations can be seen under the global variables ProtoCaller.SIMPLECATIONNAMES and ProtoCaller.COMPLEXCATIONNAMES, and you can also modify these. I just saw that I have forgotten to add zinc to the complex cation list and now it's classified as a ligand, so I should probably change this. In any case, it is very simple to replace the ligand/cofactor lists as well simply by editing protein.ligands and protein.cofactors, respectively, which are just lists of Ligand objects.

protocaller / ProtoCaller

Select chain from polymer #33