prody / ProDy

A Python Package for Protein Dynamics Analysis
http://prody.csb.pitt.edu
Other
433 stars 157 forks source link

[Ensemble Analysis -Heterogeneous X-ray Structures-PCA calculations] ValueError: the length of vectors in rows and cols must be the same.ValueError: number of atoms are not the same. #1739

Closed Lyueyang2020 closed 11 months ago

Lyueyang2020 commented 1 year ago

Hello, I am new to using ProDy so I apologize if this is a trivial question. When I use this module [Ensemble Analysis -Heterogeneous X-ray Structures-PCA calculations], the following problems occurred: (1) ValueError: the length of vectors in rows and cols must be the same.

from prody import from pylab import ion() pca = loadModel('bamA_xray.pca.npz') anm = loadModel('7ri5.anm.npz') ensemble = loadEnsemble('bamA_X-ray.ens.npz') ref_chain = parsePDB('bamA_ref_selection.pdb') showOverlapTable(pca[:3], anm[:3]);


ValueError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_22272\685033881.py in 1 #PCA - ANM ----> 2 showOverlapTable(pca[:3], anm[:3]); 3 #title('PCA - ANM Overlap Table');

c:\Users\liu\AppData\Local\Programs\Python\Python37\lib\site-packages\prody\dynamics\plotting.py in showOverlapTable(modes_x, modes_y, **kwargs) 538 num_modes_y = modes_y.numModes() 539 --> 540 overlap = calcOverlap(modes_y, modes_x) 541 take_abs = kwargs.pop('abs', True) 542 if take_abs:

c:\Users\liu\AppData\Local\Programs\Python\Python37\lib\site-packages\prody\dynamics\compare.py in calcOverlap(rows, cols, diag) 49 50 if num_rows != num_cols: ---> 51 raise ValueError('the length of vectors in rows and ' 52 'cols must be the same') 53

ValueError: the length of vectors in rows and cols must be the same


(2)ValueError: number of atoms are not the same

showProjection(ensemble, pca[:2]); axis([-4, 4, -4, 4]);


ValueError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_22272\950884423.py in ----> 1 showProjection(ensemble, pca[:2]); 2 axis([-4, 4, -4, 4]);

c:\Users\liu\AppData\Local\Programs\Python\Python37\lib\site-packages\prody\dynamics\plotting.py in showProjection(ensemble, modes, *args, **kwargs) 244 projection = calcProjection(ensemble, modes, 245 kwargs.pop('rmsd', True), --> 246 kwargs.pop('norm', False)) 247 248 if projection.ndim == 1 or projection.shape[1] == 1:

c:\Users\liu\AppData\Local\Programs\Python\Python37\lib\site-packages\prody\dynamics\analysis.py in calcProjection(ensemble, modes, rmsd, norm) 178 n_atoms = ensemble.numSelected() 179 if n_atoms != modes.numAtoms(): --> 180 raise ValueError('number of atoms are not the same') 181 if isinstance(ensemble, Vector): 182 if not ensemble.is3d():

ValueError: number of atoms are not the same

Sorry, I don’t have a programming background, so I don’t have the ability to solve this problem. I would also be grateful if you can reply.

jamesmkrieger commented 1 year ago

How did you solve it?

Lyueyang2020 commented 1 year ago

Sorry, I didn't solve it. I clicked the close function by mistake and don't know how to open it. Therfore, do you have a good solution? Thank you very much!!

jamesmkrieger commented 1 year ago

Ok, I’ve reopened it

jamesmkrieger commented 1 year ago

I’d suggest using alignChains to get AtomMap objects linking your two input structures/ensembles and then using ensemble.select and sliceAtoms and sliceModes. I can hopefully try and make a more detailed code example over the next few days

jamesmkrieger commented 1 year ago

Ok, here's a simple example with just a few atomic objects.

We have 4 chains from a crystal structure that are missing different disordered loop segments, leading to a difference in the number of Calpha atoms, e.g. 374 vs 365 for chains A and B.

Python 3.8.13 (default, Mar 28 2022, 11:38:47) 
Type 'copyright', 'credits' or 'license' for more information
IPython 8.3.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from prody import *

In [2]: chainA = parsePDB('prody/tests/datafiles/pdb3o21.pdb', chain='A', subset='ca')
@> 374 atoms and 1 coordinate set(s) were parsed in 0.01s.
@> Secondary structures were assigned to 235 residues.

In [3]: chainB = parsePDB('prody/tests/datafiles/pdb3o21.pdb', chain='B', subset='ca')
@> 365 atoms and 1 coordinate set(s) were parsed in 0.01s.
@> Secondary structures were assigned to 240 residues.

In [4]: chainC = parsePDB('prody/tests/datafiles/pdb3o21.pdb', chain='C', subset='ca')
@> 375 atoms and 1 coordinate set(s) were parsed in 0.02s.
@> Secondary structures were assigned to 245 residues.

In [5]: chainD = parsePDB('prody/tests/datafiles/pdb3o21.pdb', chain='D', subset='ca')
@> 375 atoms and 1 coordinate set(s) were parsed in 0.01s.
@> Secondary structures were assigned to 240 residues.

If we calculate ANM modes from the first 2 of them, the two sets of ANM vectors (used as rows and columns) have different numbers of x, y and z values, so we get the same error when trying with showOverlapTable.

In [6]: anm1, ca1 = calcANM(chainA)
@> Hessian was built in 0.13s.
@> 20 modes were calculated in 0.11s.

In [7]: anm1, ca1
Out[7]: 
(<ANM: 3o21A_ca (20 modes; 374 nodes)>,
 <Selection: 'calpha' from 3o21A_ca (374 atoms)>)

In [8]: anm2, ca2 = calcANM(chainB)
@> Hessian was built in 0.13s.
@> 20 modes were calculated in 0.10s.

In [9]: anm2, ca2
Out[9]: 
(<ANM: 3o21B_ca (20 modes; 365 nodes)>,
 <Selection: 'calpha' from 3o21B_ca (365 atoms)>)

In [10]: import matplotlib.pyplot as plt; plt.ion()
Out[10]: <matplotlib.pyplot._IonContext at 0x7fe3ccaef9a0>

In [11]: showOverlapTable(anm1, anm2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 showOverlapTable(anm1, anm2)

File ~/software/scipion3/software/em/ProDy/prody/dynamics/plotting.py:591, in showOverlapTable(modes_x, modes_y, **kwargs)
    588 else:
    589     num_modes_y = modes_y.numModes()
--> 591 overlap = calcOverlap(modes_y, modes_x)
    592 take_abs = kwargs.pop('abs', True)
    593 if take_abs:

File ~/software/scipion3/software/em/ProDy/prody/dynamics/compare.py:51, in calcOverlap(rows, cols, diag)
     48     num_cols = cols.numEntries()
     50 if num_rows != num_cols:
---> 51     raise ValueError('the length of vectors in rows and '
     52                      'cols must be the same')
     54 if not isinstance(rows, np.ndarray):
     55     rows = rows.getArray()

ValueError: the length of vectors in rows and cols must be the same

The same could happen if we did this with PCA vs ANM depending which structure was the reference.

In [12]: ens = buildPDBEnsemble([chainA, chainB, chainC, chainD])
@> Starting iterative superposition:             
@> Step #1: RMSD difference = 6.9068e-01
@> Step #2: RMSD difference = 1.6467e-03
@> Step #3: RMSD difference = 3.3043e-05
@> Iterative superposition completed in 0.00s.
@> Final superposition to calculate transformations.
@> Superposition completed in 0.00 seconds.
@> Ensemble (4 conformations) were built in 0.07s.

In [13]: pca = PCA()

In [14]: pca.buildCovariance(ens)
@> Covariance is calculated using 4 coordinate sets.
@> Covariance matrix calculated in 0.015260s.

In [15]: pca.calcModes()
@> 3 modes were calculated in 0.15s.

In [16]: showOverlapTable(pca, anm1)
Out[16]: 
(<matplotlib.image.AxesImage at 0x7fe33a2e3af0>,
 [],
 <matplotlib.colorbar.Colorbar at 0x7fe3397d8730>)

In [17]: showOverlapTable(pca, anm2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [17], in <cell line: 1>()
----> 1 showOverlapTable(pca, anm2)

File ~/software/scipion3/software/em/ProDy/prody/dynamics/plotting.py:591, in showOverlapTable(modes_x, modes_y, **kwargs)
    588 else:
    589     num_modes_y = modes_y.numModes()
--> 591 overlap = calcOverlap(modes_y, modes_x)
    592 take_abs = kwargs.pop('abs', True)
    593 if take_abs:

File ~/software/scipion3/software/em/ProDy/prody/dynamics/compare.py:51, in calcOverlap(rows, cols, diag)
     48     num_cols = cols.numEntries()
     50 if num_rows != num_cols:
---> 51     raise ValueError('the length of vectors in rows and '
     52                      'cols must be the same')
     54 if not isinstance(rows, np.ndarray):
     55     rows = rows.getArray()

ValueError: the length of vectors in rows and cols must be the same

To fix this, we need to slice the models to match based on alignment of the structures. This then makes all the modes have nodes corresponding to the shared set of atoms.

In [18]: amap1 = alignChains(chainA, chainB)[0]
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain A from 3o21A_ca (len=374) with Chain B from 3o21B_ca:
@>      Mapped: 364 residues match with 100% sequence identity and 100% overlap.
@> Finding the atommaps based on their coverages...
@> Identified that there exists 1 atommap(s) potentially.

In [19]: amap1
Out[19]: <AtomMap: (Chain A from 3o21A_ca -> Chain B from 3o21B_ca) from 3o21A_ca (365 atoms, 364 mapped, 1 dummy)>

In [20]: amap2 = alignChains(chainB, chainA)[0]
@> Trying to map atoms based on residue numbers and identities:
@>   Comparing Chain B from 3o21B_ca (len=365) with Chain A from 3o21A_ca:
@>      Mapped: 364 residues match with 100% sequence identity and 97% overlap.
@> Finding the atommaps based on their coverages...
@> Identified that there exists 1 atommap(s) potentially.

In [21]: amap2
Out[21]: <AtomMap: (Chain B from 3o21B_ca -> Chain A from 3o21A_ca) from 3o21B_ca (374 atoms, 364 mapped, 10 dummy)>

In [22]: slc_anm1, slc_ca1 = sliceModel(anm1, ca1, amap1.getSelstr(), norm=True)

In [23]: slc_anm2, slc_ca2 = sliceModel(anm2, ca2, amap2.getSelstr(), norm=True)

In [24]: showOverlapTable(slc_anm1, slc_anm2)
Out[24]: 
(<matplotlib.image.AxesImage at 0x7fe339635b80>,
 [],
 <matplotlib.colorbar.Colorbar at 0x7fe339653d30>)

In [25]: slc_anm1, slc_ca1
Out[25]: 
(<ANM: 3o21A_ca sliced (20 modes; 364 nodes)>,
 <Selection: '(index 0 to 30 ...3) and (calpha)' from 3o21A_ca (364 atoms)>)

In [26]: slc_anm2, slc_ca2
Out[26]: 
(<ANM: 3o21B_ca sliced (20 modes; 364 nodes)>,
 <Selection: '(index 0 to 295...4) and (calpha)' from 3o21B_ca (364 atoms)>)

In [27]: slc_pca, slc_ca1 = sliceModel(pca, ca1, amap1.getSelstr(), norm=True)

In [28]: showOverlapTable(slc_pca, slc_anm1)
Out[28]: 
(<matplotlib.image.AxesImage at 0x7fe3394fee20>,
 [],
 <matplotlib.colorbar.Colorbar at 0x7fe3394abb50>)

In [29]: slc_pca, slc_ca1
Out[29]: 
(<PCA: Unknown sliced (3 modes; 364 atoms)>,
 <Selection: '(index 0 to 30 ...3) and (calpha)' from 3o21A_ca (364 atoms)>)

In my case, we get the calculation to run but the comparisons don't make any sense still as the structures and modes aren't in the same coordinate frame.

Thus, it could also be useful to include a superpose step to ensure that the ANM and PCA are calculated on structures that are spatially aligned so that you can make a meaningful comparison.

In [14]: slc_ca2_alg, T_2 = superpose(slc_ca2, slc_ca1)

In [15]: calcRMSD(slc_ca2, slc_ca1)
Out[15]: 1.1547285801298661

In [16]: anm2, ca2 = calcANM(chainB)
@> Hessian was built in 0.12s.
@> 20 modes were calculated in 0.11s.

In [17]: slc_anm2, slc_ca2 = sliceModel(anm2, ca2, amap2.getSelstr(), norm=True)

In [18]: showOverlapTable(slc_anm1, slc_anm2)
Out[18]: 
(<matplotlib.image.AxesImage at 0x7f6ae64e31f0>,
 [],
 <matplotlib.colorbar.Colorbar at 0x7f6ae0474160>)

Whether you need this depends on which stage of the process the structures used for ANM and PCA come from.

jamesmkrieger commented 11 months ago

As I haven't had a reply, I'll assume you no longer have this issue and close it. Please feel free to open it again if you still need help with this

Lyueyang2020 commented 11 months ago

Thank you very much for your reply. I will try again later

jamesmkrieger commented 11 months ago

You’re welcome. Please let me know how it goes