Closed Lyueyang2020 closed 1 year ago
Could you please provide the code you used to help us solve this. Also, which version of prody are you using?
many thanks
from prody import *
from pylab import *
ion()
pdbids = ['1AO6','1BJ5','1BKE','1BM0','1E78','1E7A','1E7B','1E7C','1E7E',
'1E7F','1E7G','1E7H','1E7I','1GNI','1GNJ','1H9Z','1HA2','1HK1'...]
pdbfiles = fetchPDB(*pdbids, compressed=False)
ref_structure = parsePDB('1ao6.pdb')
ref_selection = ref_structure.select('resnum 5 to 582')
ref_chain = ref_selection.getHierView().getChain('A')
repr(ref_chain)
startLogfile('hsa_pca')
ensemble = PDBEnsemble('hsa X-ray')
ensemble.setAtoms(ref_chain)
ensemble.setCoords(ref_chain)
for pdbfile in pdbfiles:
structure = parsePDB(pdbfile, subset='calpha')
mappings = mapOntoChain(structure, ref_chain)
atommap = mappings[0][0]
ensemble.addCoordset(atommap, weights=atommap.getFlags('mapped'))
repr(ensemble)
len(ensemble) == len(pdbfiles)
ensemble.iterpose()
closeLogfile('hsa_pca')
writePDB('hsa_xray_ensemble.pdb', ensemble)
pca = PCA('hsa xray')
pca.buildCovariance(ensemble)
pca.calcModes()
Thank you very much for your reply. The above is the content learned from the tutorial.
Does your ensemble pdb look right?
You are right. The ensemble pdb file is abnormal. I tried the web tutorial and it worked (add chain= 'A', but this has limitations, not all target protein structures are chain A). But I still don't know what the problem is. Thanks for your reply!
#1.1
from prody import *
from pylab import *
ion()
pdbids = ['1AO6','1BJ5','1BKE','1BM0','1E78','1E7A','1E7B','1E7C','1E7E','1E7F',
'1E7G','1E7H','1E7I','1GNI','1GNJ','1H9Z','1HA2','1HK1','1HK2','1HK3',...]
structures = parsePDB(pdbids, **chain='A'**,subset='ca', compressed=False)
#1.2 Set reference chain
ref_structure = parsePDB('1ao6.pdb')
ref_selection = ref_structure.select('resnum 5 to 582')
ref_chain = ref_selection.getHierView().getChain('A')
repr(ref_chain)
#1.3 Prepare ensemble
ensemble = buildPDBEnsemble(structures, title='hsa X-ray')
Ensemble
ensemble.iterpose()
writePDB('hsa_xray_ensemble.pdb', ensemble)
#PCA
pca = PCA('hsa xray')
pca.buildCovariance(ensemble)
pca.calcModes()
You may want to try changing the mapping function, seqid, overlap or rmsd_reject.
sorry we haven’t had a chance to try running it and see what happens ourselves
You can also try using scipion-em-prody to run all these things with a graphical user interface in the scipion workflow engine. It can be a bit of a pain to install but it’s not so bad if you just want it for this.
You may also just want to try using the development version of ProDy from GitHub rather than 2.4.0 from pip as we have fixed some things
ProDy 2.4.0
Python 3.7.9 matplotlib 3.5.3
from prody import * from pylab import * ion() pdbids = ['1AO6','1BJ5','1BKE','1BM0','1E78','1E7A','1E7B','1E7C','1E7E', '1E7F','1E7G','1E7H','1E7I','1GNI','1GNJ','1H9Z','1HA2','1HK1'...] pdbfiles = fetchPDB(*pdbids, compressed=False) ref_structure = parsePDB('1ao6.pdb') ref_selection = ref_structure.select('resnum 5 to 582') ref_chain = ref_selection.getHierView().getChain('A') repr(ref_chain) startLogfile('hsa_pca') ensemble = PDBEnsemble('hsa X-ray') ensemble.setAtoms(ref_chain) ensemble.setCoords(ref_chain) for pdbfile in pdbfiles: structure = parsePDB(pdbfile, subset='calpha') mappings = mapOntoChain(structure, ref_chain) atommap = mappings[0][0] ensemble.addCoordset(atommap, weights=atommap.getFlags('mapped')) repr(ensemble) len(ensemble) == len(pdbfiles) ensemble.iterpose() closeLogfile('hsa_pca') writePDB('hsa_xray_ensemble.pdb', ensemble) pca = PCA('hsa xray') pca.buildCovariance(ensemble) pca.calcModes()
Thank you very much for your reply. The above is the content learned from the tutorial.
Removing the ... from the end of the pdb list and the ** around chain A, this worked fine for me
Removing the chain A, it says that the ensemble must have more than 2 coordinate sets for PCA. I'm getting warnings in the ensemble building that the structures have fewer chains than 1AO6_ca. I think one problem may be that you're not including your ref_chain as a reference
Using the following code, I get the error about the covariance matrix containing NaN:
In [14]: from prody import *
...: from pylab import *
...: ion()
...: pdbids = ['1AO6','1BJ5','1BKE','1BM0','1E78','1E7A','1E7B','1E7C','1E7E','1E7F',
...: '1E7G','1E7H','1E7I','1GNI','1GNJ','1H9Z','1HA2','1HK1','1HK2','1HK3']
...: structures = parsePDB(pdbids, subset='ca', compressed=False)
...: #1.2 Set reference chain
...: ref_structure = parsePDB('1ao6.pdb')
...: ref_selection = ref_structure.select('resnum 5 to 582')
...: ref_chain = ref_selection.getHierView().getChain('A')
...: repr(ref_chain)
...: #1.3 Prepare ensemble
...: ensemble = buildPDBEnsemble(structures, title='hsa X-ray', ref=ref_chain)
...: Ensemble
...: ensemble.iterpose()
...: writePDB('hsa_xray_ensemble.pdb', ensemble)
...: #PCA
...: pca = PCA('hsa xray')
...: pca.buildCovariance(ensemble)
...: pca.calcModes()
The ensemble has many atoms besides the CA atoms that most of the structures have and that's what the problem.
Using this code with subset ca for the reference structure too fixes it:
In [22]: from prody import *
...: from pylab import *
...: ion()
...: pdbids = ['1AO6','1BJ5','1BKE','1BM0','1E78','1E7A','1E7B','1E7C','1E7E','1E7F',
...: '1E7G','1E7H','1E7I','1GNI','1GNJ','1H9Z','1HA2','1HK1','1HK2','1HK3']
...: structures = parsePDB(pdbids, subset='ca', compressed=False)
...: #1.2 Set reference chain
...: ref_structure = parsePDB('1ao6.pdb', subset='ca')
...: ref_selection = ref_structure.select('resnum 5 to 582')
...: ref_chain = ref_selection.getHierView().getChain('A')
...: repr(ref_chain)
...: #1.3 Prepare ensemble
...: ensemble = buildPDBEnsemble(structures, title='hsa X-ray', ref=ref_chain)
...: Ensemble
...: ensemble.iterpose()
...: writePDB('hsa_xray_ensemble.pdb', ensemble)
...: #PCA
...: pca = PCA('hsa xray')
...: pca.buildCovariance(ensemble)
...: pca.calcModes()
To reduce the risk of mistakes and to avoid having the reference structure included in the ensemble twice, you could also use the reference structure that has already been parsed rather than parsing it again.
You can also use a .copy() on the ref_chain to not include a selection for the reference because that can make problems with using setAtoms() later.
In [39]: from prody import *
...: from pylab import *
...: ion()
...: pdbids = ['1AO6','1BJ5','1BKE','1BM0','1E78','1E7A','1E7B','1E7C','1E7E','1E7F',
...: '1E7G','1E7H','1E7I','1GNI','1GNJ','1H9Z','1HA2','1HK1','1HK2','1HK3']
...: structures = parsePDB(pdbids, subset='ca', compressed=False)
...: #1.2 Set reference chain
...: ref_structure = structures[0]
...: ref_selection = ref_structure.select('resnum 5 to 582')
...: ref_chain = ref_selection.getHierView().getChain('A').copy()
...: repr(ref_chain)
...: #1.3 Prepare ensemble
...: ensemble = buildPDBEnsemble(structures, title='hsa X-ray', ref=ref_chain)
...: Ensemble
...: ensemble.iterpose()
...: writePDB('hsa_xray_ensemble.pdb', ensemble)
...: #PCA
...: pca = PCA('hsa xray')
...: pca.buildCovariance(ensemble)
...: pca.calcModes()
I'd say this issue is solved then too. Please feel free to reopen if you disagree and/or need more help.
Hello, I am new to using ProDy so I apologize if this is a trivial question. When I use this module [Ensemble Analysis -Heterogeneous X-ray Structures-PCA calculations], the following problems occurred:
ValueError: array must not contain infs or NaNs
Detailed error is:
Very sorry I don’t have a programming background, so I don’t have the ability to solve this problem. Thank you very much for your answer.