Closed Lyueyang2020 closed 11 months ago
How did you solve it?
Sorry, I didn't solve it. I clicked the close function by mistake and don't know how to open it. Therfore, do you have a good solution? Thank you very much!!
Ok, I’ve reopened it
I’d suggest using alignChains to get AtomMap objects linking your two input structures/ensembles and then using ensemble.select and sliceAtoms and sliceModes. I can hopefully try and make a more detailed code example over the next few days
Ok, here's a simple example with just a few atomic objects.
We have 4 chains from a crystal structure that are missing different disordered loop segments, leading to a difference in the number of Calpha atoms, e.g. 374 vs 365 for chains A and B.
Python 3.8.13 (default, Mar 28 2022, 11:38:47)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.3.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from prody import *
In [2]: chainA = parsePDB('prody/tests/datafiles/pdb3o21.pdb', chain='A', subset='ca')
@> 374 atoms and 1 coordinate set(s) were parsed in 0.01s.
@> Secondary structures were assigned to 235 residues.
In [3]: chainB = parsePDB('prody/tests/datafiles/pdb3o21.pdb', chain='B', subset='ca')
@> 365 atoms and 1 coordinate set(s) were parsed in 0.01s.
@> Secondary structures were assigned to 240 residues.
In [4]: chainC = parsePDB('prody/tests/datafiles/pdb3o21.pdb', chain='C', subset='ca')
@> 375 atoms and 1 coordinate set(s) were parsed in 0.02s.
@> Secondary structures were assigned to 245 residues.
In [5]: chainD = parsePDB('prody/tests/datafiles/pdb3o21.pdb', chain='D', subset='ca')
@> 375 atoms and 1 coordinate set(s) were parsed in 0.01s.
@> Secondary structures were assigned to 240 residues.
If we calculate ANM modes from the first 2 of them, the two sets of ANM vectors (used as rows and columns) have different numbers of x, y and z values, so we get the same error when trying with showOverlapTable.
In [6]: anm1, ca1 = calcANM(chainA)
@> Hessian was built in 0.13s.
@> 20 modes were calculated in 0.11s.
In [7]: anm1, ca1
Out[7]:
(<ANM: 3o21A_ca (20 modes; 374 nodes)>,
<Selection: 'calpha' from 3o21A_ca (374 atoms)>)
In [8]: anm2, ca2 = calcANM(chainB)
@> Hessian was built in 0.13s.
@> 20 modes were calculated in 0.10s.
In [9]: anm2, ca2
Out[9]:
(<ANM: 3o21B_ca (20 modes; 365 nodes)>,
<Selection: 'calpha' from 3o21B_ca (365 atoms)>)
In [10]: import matplotlib.pyplot as plt; plt.ion()
Out[10]: <matplotlib.pyplot._IonContext at 0x7fe3ccaef9a0>
In [11]: showOverlapTable(anm1, anm2)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 showOverlapTable(anm1, anm2)
File ~/software/scipion3/software/em/ProDy/prody/dynamics/plotting.py:591, in showOverlapTable(modes_x, modes_y, **kwargs)
588 else:
589 num_modes_y = modes_y.numModes()
--> 591 overlap = calcOverlap(modes_y, modes_x)
592 take_abs = kwargs.pop('abs', True)
593 if take_abs:
File ~/software/scipion3/software/em/ProDy/prody/dynamics/compare.py:51, in calcOverlap(rows, cols, diag)
48 num_cols = cols.numEntries()
50 if num_rows != num_cols:
---> 51 raise ValueError('the length of vectors in rows and '
52 'cols must be the same')
54 if not isinstance(rows, np.ndarray):
55 rows = rows.getArray()
ValueError: the length of vectors in rows and cols must be the same
The same could happen if we did this with PCA vs ANM depending which structure was the reference.
In [12]: ens = buildPDBEnsemble([chainA, chainB, chainC, chainD])
@> Starting iterative superposition:
@> Step #1: RMSD difference = 6.9068e-01
@> Step #2: RMSD difference = 1.6467e-03
@> Step #3: RMSD difference = 3.3043e-05
@> Iterative superposition completed in 0.00s.
@> Final superposition to calculate transformations.
@> Superposition completed in 0.00 seconds.
@> Ensemble (4 conformations) were built in 0.07s.
In [13]: pca = PCA()
In [14]: pca.buildCovariance(ens)
@> Covariance is calculated using 4 coordinate sets.
@> Covariance matrix calculated in 0.015260s.
In [15]: pca.calcModes()
@> 3 modes were calculated in 0.15s.
In [16]: showOverlapTable(pca, anm1)
Out[16]:
(<matplotlib.image.AxesImage at 0x7fe33a2e3af0>,
[],
<matplotlib.colorbar.Colorbar at 0x7fe3397d8730>)
In [17]: showOverlapTable(pca, anm2)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [17], in <cell line: 1>()
----> 1 showOverlapTable(pca, anm2)
File ~/software/scipion3/software/em/ProDy/prody/dynamics/plotting.py:591, in showOverlapTable(modes_x, modes_y, **kwargs)
588 else:
589 num_modes_y = modes_y.numModes()
--> 591 overlap = calcOverlap(modes_y, modes_x)
592 take_abs = kwargs.pop('abs', True)
593 if take_abs:
File ~/software/scipion3/software/em/ProDy/prody/dynamics/compare.py:51, in calcOverlap(rows, cols, diag)
48 num_cols = cols.numEntries()
50 if num_rows != num_cols:
---> 51 raise ValueError('the length of vectors in rows and '
52 'cols must be the same')
54 if not isinstance(rows, np.ndarray):
55 rows = rows.getArray()
ValueError: the length of vectors in rows and cols must be the same
To fix this, we need to slice the models to match based on alignment of the structures. This then makes all the modes have nodes corresponding to the shared set of atoms.
In [18]: amap1 = alignChains(chainA, chainB)[0]
@> Trying to map atoms based on residue numbers and identities:
@> Comparing Chain A from 3o21A_ca (len=374) with Chain B from 3o21B_ca:
@> Mapped: 364 residues match with 100% sequence identity and 100% overlap.
@> Finding the atommaps based on their coverages...
@> Identified that there exists 1 atommap(s) potentially.
In [19]: amap1
Out[19]: <AtomMap: (Chain A from 3o21A_ca -> Chain B from 3o21B_ca) from 3o21A_ca (365 atoms, 364 mapped, 1 dummy)>
In [20]: amap2 = alignChains(chainB, chainA)[0]
@> Trying to map atoms based on residue numbers and identities:
@> Comparing Chain B from 3o21B_ca (len=365) with Chain A from 3o21A_ca:
@> Mapped: 364 residues match with 100% sequence identity and 97% overlap.
@> Finding the atommaps based on their coverages...
@> Identified that there exists 1 atommap(s) potentially.
In [21]: amap2
Out[21]: <AtomMap: (Chain B from 3o21B_ca -> Chain A from 3o21A_ca) from 3o21B_ca (374 atoms, 364 mapped, 10 dummy)>
In [22]: slc_anm1, slc_ca1 = sliceModel(anm1, ca1, amap1.getSelstr(), norm=True)
In [23]: slc_anm2, slc_ca2 = sliceModel(anm2, ca2, amap2.getSelstr(), norm=True)
In [24]: showOverlapTable(slc_anm1, slc_anm2)
Out[24]:
(<matplotlib.image.AxesImage at 0x7fe339635b80>,
[],
<matplotlib.colorbar.Colorbar at 0x7fe339653d30>)
In [25]: slc_anm1, slc_ca1
Out[25]:
(<ANM: 3o21A_ca sliced (20 modes; 364 nodes)>,
<Selection: '(index 0 to 30 ...3) and (calpha)' from 3o21A_ca (364 atoms)>)
In [26]: slc_anm2, slc_ca2
Out[26]:
(<ANM: 3o21B_ca sliced (20 modes; 364 nodes)>,
<Selection: '(index 0 to 295...4) and (calpha)' from 3o21B_ca (364 atoms)>)
In [27]: slc_pca, slc_ca1 = sliceModel(pca, ca1, amap1.getSelstr(), norm=True)
In [28]: showOverlapTable(slc_pca, slc_anm1)
Out[28]:
(<matplotlib.image.AxesImage at 0x7fe3394fee20>,
[],
<matplotlib.colorbar.Colorbar at 0x7fe3394abb50>)
In [29]: slc_pca, slc_ca1
Out[29]:
(<PCA: Unknown sliced (3 modes; 364 atoms)>,
<Selection: '(index 0 to 30 ...3) and (calpha)' from 3o21A_ca (364 atoms)>)
In my case, we get the calculation to run but the comparisons don't make any sense still as the structures and modes aren't in the same coordinate frame.
Thus, it could also be useful to include a superpose
step to ensure that the ANM and PCA are calculated on structures that are spatially aligned so that you can make a meaningful comparison.
In [14]: slc_ca2_alg, T_2 = superpose(slc_ca2, slc_ca1)
In [15]: calcRMSD(slc_ca2, slc_ca1)
Out[15]: 1.1547285801298661
In [16]: anm2, ca2 = calcANM(chainB)
@> Hessian was built in 0.12s.
@> 20 modes were calculated in 0.11s.
In [17]: slc_anm2, slc_ca2 = sliceModel(anm2, ca2, amap2.getSelstr(), norm=True)
In [18]: showOverlapTable(slc_anm1, slc_anm2)
Out[18]:
(<matplotlib.image.AxesImage at 0x7f6ae64e31f0>,
[],
<matplotlib.colorbar.Colorbar at 0x7f6ae0474160>)
Whether you need this depends on which stage of the process the structures used for ANM and PCA come from.
As I haven't had a reply, I'll assume you no longer have this issue and close it. Please feel free to open it again if you still need help with this
Thank you very much for your reply. I will try again later
You’re welcome. Please let me know how it goes
Hello, I am new to using ProDy so I apologize if this is a trivial question. When I use this module [Ensemble Analysis -Heterogeneous X-ray Structures-PCA calculations], the following problems occurred: (1) ValueError: the length of vectors in rows and cols must be the same.
ValueError: the length of vectors in rows and cols must be the same
(2)ValueError: number of atoms are not the same
Sorry, I don’t have a programming background, so I don’t have the ability to solve this problem. I would also be grateful if you can reply.