Closed amorehead closed 6 months ago
Can you give us a pdb id to check this? Is it definitely a pdb file or could it be an mmcif?
there’s an option unite_chains for parsing mmcif that you could try and see if that helps
It's definitely a PDB file. This complex.pdb
file corresponds to the protein-ligand-nucleic acid complex structure which is denoted as target H1171v1 in CASP15 (and corresponds to PDB ID: 7PBL (https://www.rcsb.org/structure/7pbl)).
ok, thanks
Note that, using BioPython, I was able to parse all of the six ligand chains successfully. With ProDy, it's possible that chain M
having a different residue name from all the other ligand chain's residue names is somehow throwing off ProDy.
Can you provide the file please? I don't get a chain M when parsing with the pdb id and displaying the pdb file on the rcsb pdb doesn't show one either.
I also don't get a chain M when using Biopython from the file downloaded from the pdb by prody:
In [15]: ag = parsePDB('7PBL', compressed=False)
In [16]: struct = PDBParser().get_structure('7pbl', file='7pbl.pdb')
/home/jkrieger/software/miniconda/envs/scipion3/lib/python3.8/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 32028.
warnings.warn(
/home/jkrieger/software/miniconda/envs/scipion3/lib/python3.8/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 32074.
warnings.warn(
/home/jkrieger/software/miniconda/envs/scipion3/lib/python3.8/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain C is discontinuous at line 32120.
warnings.warn(
/home/jkrieger/software/miniconda/envs/scipion3/lib/python3.8/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain D is discontinuous at line 32166.
warnings.warn(
/home/jkrieger/software/miniconda/envs/scipion3/lib/python3.8/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain E is discontinuous at line 32205.
warnings.warn(
/home/jkrieger/software/miniconda/envs/scipion3/lib/python3.8/site-packages/Bio/PDB/StructureBuilder.py:89: PDBConstructionWarning: WARNING: Chain F is discontinuous at line 32244.
warnings.warn(
In [17]: struct.get_chains()
Out[17]: <generator object Structure.get_chains at 0x7efcc6133350>
In [18]: list(struct.get_chains())
Out[18]:
[<Chain id=A>,
<Chain id=B>,
<Chain id=C>,
<Chain id=D>,
<Chain id=E>,
<Chain id=F>,
<Chain id=G>,
<Chain id=U>,
<Chain id=V>]
Apologies, I should clarify, the file was uploaded in my initial message (i.e., here: https://github.com/prody/ProDy/files/14579373/Complex.pdb.txt). You can just rename it to Complex.pdb
.
Thank you. Sorry I missed it
ok, I think the problem is that the ligand in chain M is ADP and prody is treating that as nucleic
. You can select nucleoside
separately to get it back:
In [2]: pdb = parsePDB('complex.pdb')
In [3]: ligand = pdb.select("not (protein or nucleic or water) or nucleoside")
In [4]: list(ligand.getHierView())
Out[4]:
[<Chain: I from Segment H from complex (1 residues, 45 atoms)>,
<Chain: J from Segment I from complex (1 residues, 45 atoms)>,
<Chain: K from Segment J from complex (1 residues, 45 atoms)>,
<Chain: L from Segment K from complex (1 residues, 45 atoms)>,
<Chain: M from Segment L from complex (1 residues, 39 atoms)>,
<Chain: N from Segment M from complex (1 residues, 45 atoms)>,
<Chain: O from Segment N from complex (1 residues, 1 atoms)>,
<Chain: O from Segment O from complex (1 residues, 1 atoms)>,
<Chain: O from Segment P from complex (1 residues, 1 atoms)>,
<Chain: O from Segment Q from complex (1 residues, 1 atoms)>,
<Chain: O from Segment R from complex (1 residues, 1 atoms)>]
I'll make a pull request adding a new nucleic_poly flag to make this easier. I also noticed we were missing a few nucleosides so I'll add them too.
Oh, it also looks like ProDy is completely wrong/confusing about what nucleosides and nucleotides are. It uses nucleoside
for "nucleoside derivatives", which should then be nucleotides. Then it uses nucleotide
for those incorporated into DNA or RNA.
In that case, you can use not nucleotide
instead of not nucleic
to get the behaviour you want:
In [1]: from prody import *
In [2]: pdb = parsePDB('complex.pdb')
In [3]: ligand = pdb.select("not (protein or nucleotide or water)")
In [4]: list(ligand.getHierView())
Out[4]:
[<Chain: I from Segment H from complex (1 residues, 45 atoms)>,
<Chain: J from Segment I from complex (1 residues, 45 atoms)>,
<Chain: K from Segment J from complex (1 residues, 45 atoms)>,
<Chain: L from Segment K from complex (1 residues, 45 atoms)>,
<Chain: M from Segment L from complex (1 residues, 39 atoms)>,
<Chain: N from Segment M from complex (1 residues, 45 atoms)>,
<Chain: O from Segment N from complex (1 residues, 1 atoms)>,
<Chain: O from Segment O from complex (1 residues, 1 atoms)>,
<Chain: O from Segment P from complex (1 residues, 1 atoms)>,
<Chain: O from Segment Q from complex (1 residues, 1 atoms)>,
<Chain: O from Segment R from complex (1 residues, 1 atoms)>]
In [5]: set(ligand.getResnames())
Out[5]: {'ADP', 'AGS', 'MG'}
It's also worth noting that protein points to any amino acid and not just those incorporated in polymers, and thus causes problems with ligand selection too.
Thanks @jamesmkrieger for figuring this out! Is there anything else you need from me on my end to resolve this issue?
You’re welcome. I don’t think so.
Hello.
It seems that with Prody version 2.4.1 (same behavior with the latest changes in
master
) not all unique ligand chains can be parsed for the attached complex (protein-ligand-nucleic acid) PDB file using the following code.In particular, chain
M
amongst the ligand chains is ignored by this parsing and selection logic. Would you happen to know what might be causing this parsing failure? I've tried viewingligand.getChids()
as well aslist(ligand.getHierView().iterChains())
, but chainM
always seems to be missing. I also tried callingligand.setSegnames(ligand.getChids())
, but this didn't solve the issue either. Might this be related to some kind of misalignment between segment IDs and chain IDs (even though I can't spot such a misalignment)?Complex.pdb.txt