theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
414 stars 102 forks source link

Trying to set attribute `.layers` of view, making a copy. #122

Closed vkorobeynyk closed 4 years ago

vkorobeynyk commented 4 years ago

Dear scVelo community. Great progress that you made in the RNA Velocity field. I have been having a problem when I try to plot.

If I run:

scv.pl.velocity(adata, var_names= "Snupn" , colorbar = True )

I get a Message: "Trying to set attribute .layers of view, making a copy." .

It makes copies until I get out of RAM and then crashes. Have you any idea what can it be?


KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-95-c455241f9bc7> in <module>
----> 1 scv.pl.velocity(adata, var_names= "Snupn" , basis = "umap", colorbar = True )

/usr/local/lib/python3.6/dist-packages/scvelo/plotting/velocity.py in velocity(adata, var_names, basis, vkey, mode, fits, layers, color, color_map, colorbar, perc, alpha, size, groupby, groups, use_raw, fontsize, figsize, dpi, show, save, ax, ncols, **kwargs)
    109         scatter(adata, basis=var, color=color, colorbar=colorbar, frameon=True, title=var, size=size, use_raw=use_raw,
    110                 alpha=alpha, fontsize=fontsize, xlabel='spliced', ylabel='unspliced', show=False, ax=ax, save=False,
--> 111                 legend_loc=None if v < len(var_names)-1 else 'lower right', **kwargs)
    112 
    113         # velocity and expression plots

/usr/local/lib/python3.6/dist-packages/scvelo/plotting/scatter.py in scatter(adata, x, y, basis, vkey, color, use_raw, layer, color_map, colorbar, palette, size, alpha, linewidth, perc, sort_order, groups, components, projection, legend_loc, legend_fontsize, legend_fontweight, right_margin, left_margin, xlabel, ylabel, title, fontsize, figsize, xlim, ylim, show_density, show_assignments, show_linear_fit, show_polyfit, rug, n_convolve, smooth, rescale_color, dpi, frameon, show, save, ax, zorder, ncols, **kwargs)
    216                         lines.append(line)
    217                     if 'fit_alpha' in adata.var.keys() and (vkey is None or 'dynamics' in vkey):
--> 218                         line, fit = show_full_dynamics(adata, basis, 'fit', use_raw, linewidth, show_assignments=show_assignments, ax=ax)
    219                         fits.append(fit)
    220                         lines.append(line)

/usr/local/lib/python3.6/dist-packages/scvelo/plotting/simulation.py in show_full_dynamics(adata, basis, key, use_raw, linewidth, show_assignments, ax)
     64 
     65     if show_assignments is not 'only':
---> 66         _, ut, st = compute_dynamics(adata, basis, key, extrapolate=True, t=show_assignments)
     67         line, = ax.plot(st, ut, color=color, linewidth=linewidth, label=label)
     68 

/usr/local/lib/python3.6/dist-packages/scvelo/plotting/simulation.py in compute_dynamics(adata, basis, key, extrapolate, sort, t_, t)
     38         u0_ = unspliced(t_, 0, alpha, beta)
     39         tmax = np.max(t) if True else tau_inv(u0_ * 1e-4, u0=u0_, alpha=0, beta=beta)
---> 40         t = np.concatenate([np.linspace(0, t_, num=500), np.linspace(t_, tmax, num=500)])
     41 
     42     tau, alpha, u0, s0 = vectorize(np.sort(t) if sort else t, t_, alpha, beta, gamma)

<__array_function__ internals> in linspace(*args, **kwargs)

/usr/local/lib/python3.6/dist-packages/numpy/core/function_base.py in linspace(start, stop, num, endpoint, retstep, dtype, axis)
    169 
    170     if endpoint and num > 1:
--> 171         y[-1] = stop
    172 
    173     if axis != 0:

/usr/local/lib/python3.6/dist-packages/anndata/core/views.py in __setitem__(self, idx, value)
     27             attr = getattr(new, attr_name)
     28             container = reduce(lambda d, k: d[k], keys, attr)
---> 29             container[idx] = value
     30             adata_view._init_as_actual(new)
     31 

/usr/local/lib/python3.6/dist-packages/anndata/core/views.py in __setitem__(self, idx, value)
     27             attr = getattr(new, attr_name)
     28             container = reduce(lambda d, k: d[k], keys, attr)
---> 29             container[idx] = value
     30             adata_view._init_as_actual(new)
     31 

/usr/local/lib/python3.6/dist-packages/anndata/core/views.py in __setitem__(self, idx, value)
     27             attr = getattr(new, attr_name)
     28             container = reduce(lambda d, k: d[k], keys, attr)
---> 29             container[idx] = value
     30             adata_view._init_as_actual(new)
     31 

/usr/local/lib/python3.6/dist-packages/anndata/core/views.py in __setitem__(self, idx, value)
     24             logger.warning(
     25                 'Trying to set attribute `.{}` of view, making a copy.'.format(attr_name))
---> 26             new = adata_view.copy()
     27             attr = getattr(new, attr_name)
     28             container = reduce(lambda d, k: d[k], keys, attr)

/usr/local/lib/python3.6/dist-packages/anndata/core/anndata.py in copy(self, filename)
   1557                            self._obsm.copy(), self._varm.copy(),
   1558                            raw=None if self._raw is None else self._raw.copy(),
-> 1559                            layers=dict(self.layers),
   1560                            dtype=self._X.dtype.name if self._X is not None else 'float32')
   1561         else:

/usr/local/lib/python3.6/dist-packages/anndata/core/alignedmapping.py in __getitem__(self, key)
    121                 ViewArgs(self.parent, self.attrname, (key,))
    122             ),
--> 123             self.subset_idx
    124         )
    125 

/usr/lib/python3.6/functools.py in wrapper(*args, **kw)
    805                             '1 positional argument')
    806 
--> 807         return dispatch(args[0].__class__)(*args, **kw)
    808 
    809     funcname = getattr(func, '__name__', 'singledispatch function')

/usr/local/lib/python3.6/dist-packages/anndata/core/alignedmapping.py in _subset(mat, subset_idx)
     15 @singledispatch
     16 def _subset(mat, subset_idx):
---> 17     return mat[subset_idx]
     18 
     19 @_subset.register(pd.DataFrame)

/usr/local/lib/python3.6/dist-packages/anndata/core/views.py in __array_finalize__(self, obj)
     60         return arr
     61 
---> 62     def __array_finalize__(self, obj: Optional[np.ndarray]):
     63         if obj is not None:
     64             self._view_args = getattr(obj, '_view_args', None)```
stefanpeidli commented 4 years ago

Thanks for telling us. First thing to check would be your versions; you can print them using scv.logging.print_versions() and post the results here. Also, your adata might be a view (which means it is somewhat only a reference to a different, actual annData object). You can check this by running print(adata.isview).

vkorobeynyk commented 4 years ago

The adata is not a view. I had an older version of loompy. It works now. Thanks

MichaelPeibo commented 4 years ago

Hi, I got same problem and did not got through by updating loompy. @stefanpeidli @vkorobeynyk The adata is not a view.

Running scvelo 0.1.25.dev51+ge35c425 (python 3.7.3) on 2019-12-19 18:33.

loompy=3.0.6
scanpy==1.4.4.post1 anndata==0.6.22.post1 umap==0.3.10 numpy==1.17.0 scipy==1.3.0 pandas==0.23.4 scikit-learn==0.21.3 statsmodels==0.10.1 python-igraph==0.7.1 louvain==0.6.1

Any suggestion?

stefanpeidli commented 4 years ago

Hi, so you get the exact same error when running scv.pl.velocity? What does the 'post1' suffix mean in your scanpy and anndata versions? I've never seen that before.

MichaelPeibo commented 4 years ago

@stefanpeidli Hi, I just worked through this step. I had non-sparse matrix of splicedand unsplicedlayers, I converted them to sparse, steps below worked.

scv.pp.filter_and_normalize(bdata, min_shared_counts=20, n_top_genes=2000)
scv.pp.moments(bdata, n_pcs=30, n_neighbors=30)

> Filtered out 10304 genes that are detected in less than 20 counts (shared).
> Normalized count data: spliced, unspliced.
> WARNING: Did not modify X as it looks preprocessed already.
> WARNING: You seem to have duplicate cells in your data. Consider removing these via pp.remove_duplicate_cells.
> computing neighbors
>     finished (0:00:20) --> added to `.uns['neighbors']`
>     'distances', weighted adjacency matrix
>     'connectivities', weighted adjacency matrix
> computing moments based on connectivities
>     finished (0:00:09) --> added 
>     'Ms' and 'Mu', moments of spliced/unspliced abundances (adata.layers)

scv.tl.velocity(bdata)

> computing velocities
>     finished (0:00:20) --> added 
>     'velocity', velocity vectors for each individual cell (adata.layers)
> 

But I got followin error, scv.tl.velocity_graph(bdata)

computing velocity graph
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-1c25c273634a> in <module>
----> 1 scv.tl.velocity_graph(bdata)

~/anaconda3/envs/myenv/lib/python3.7/site-packages/scvelo/tools/velocity_graph.py in velocity_graph(data, vkey, xkey, tkey, basis, n_neighbors, n_recurse_neighbors, random_neighbors_at_max, sqrt_transform, variance_stabilization, gene_subset, approx, copy)
    196 
    197     logg.info('computing velocity graph', r=True)
--> 198     vgraph.compute_cosines()
    199 
    200     adata.uns[vkey+'_graph'] = vgraph.graph

~/anaconda3/envs/myenv/lib/python3.7/site-packages/scvelo/tools/velocity_graph.py in compute_cosines(self)
    103         progress = logg.ProgressReporter(n_obs)
    104         for i in range(n_obs):
--> 105             if self.V[i].max() != 0 or self.V[i].min() != 0:
    106                 neighs_idx = get_iterative_indices(self.indices, i, self.n_recurse_neighbors, self.max_neighs)
    107 

~/.local/lib/python3.7/site-packages/numpy-1.17.0-py3.7-linux-x86_64.egg/numpy/core/_methods.py in _amax(a, axis, out, keepdims, initial, where)
     28 def _amax(a, axis=None, out=None, keepdims=False,
     29           initial=_NoValue, where=True):
---> 30     return umr_maximum(a, axis, None, out, keepdims, initial, where)
     31 
     32 def _amin(a, axis=None, out=None, keepdims=False,

ValueError: zero-size array to reduction operation maximum which has no identity
MichaelPeibo commented 4 years ago

Hi, so you get the exact same error when running scv.pl.velocity? What does the 'post1' suffix mean in your scanpy and anndata versions? I've never seen that before.

probably dev version?

stefanpeidli commented 4 years ago

Seems to me that your bdata might have a lot of NaNs or barely any data left. You could check how the bdata looks like (e.g. how many zeros, NaNs and which size) after filtering.

MichaelPeibo commented 4 years ago

I checked the nan and zeros in bdata. np.isnan(bdata.X.data).any()

False

(bdata.X.data==0).any()

False

For clarity, here is full code I used:

## bdata is my preprocessed anndata
emat=pd.read_csv('/home/xupb/emat.da.merge.csv',index_col=0).T
bdata.layers["spliced"]=csr_matrix(emat)
nmat=pd.read_csv('/home/xupb/nmat.da.merge.csv',index_col=0).T
bdata.layers["unspliced"]=csr_matrix(nmat)

## adata is used to pass adata.X to bdata.X of my original anndata
adata=adata[bdata.obs.index]
adata=adata[:,bdata.var.index]
bdata.X=adata.X

scv.pp.filter_and_normalize(bdata, min_shared_counts=20, n_top_genes=2000)
scv.pp.moments(bdata, n_pcs=30, n_neighbors=30)
scv.tl.velocity(bdata)
scv.tl.velocity_graph(bdata)### Got errors here

Could it be due to subsetting ?

stefanpeidli commented 4 years ago

The error ocurres because you overwrite the X of your bdata, which generally should not be done. This makes X a np.ndarray.view object instead of a regular numpy array which scvelo refuses to work with (for good reason, since a view tells us that there are mutliple variables sharing the same memory). Out of curiosity, why do you want to overwrite X in the first place?

The better strategy would be to not overwrite X but instead copy the layers and annotations from your bdata object into the adata one.

MichaelPeibo commented 4 years ago
  1. I thought something wrong with bdata.X, which is normalized values:
    
    array([[-0.4160007 , -0.17436647, -0.15166533, ..., -0.01293505,
         0.01771855, -0.04358341],
       [-0.5849805 , -0.31617087, -0.18913265, ..., -0.02730939,
         0.01081311, -0.03910762],
       [-0.29471755, -0.15501873, -0.11666121, ...,  0.00473892,
         0.02537772, -0.02185296],
       ...,
       [-0.3796029 , -0.2003614 , -0.16623697, ..., -0.04402434,
        -0.03586256, -0.05287427],
       [-0.1688789 , -0.16106547, -0.02051124, ..., -0.00923129,
        -0.00515935,  0.01816866],
       [ 4.1280065 , -0.17396046, -0.08692747, ..., -0.02821098,
        -0.01924141, -0.01748079]], dtype=float32)
I thought overwirting X may help with it.

2. I tried to create a new anndata with  :

import anndata new_adata = anndata.AnnData( X=bdata.X, var=bdata.var, obs=bdata.obs, layers=bdata.layers ) new_adata

still same error.

3.Also I tried run 

scv.tl.velocity(bdata) scv.tl.velocity_graph(bdata)

 directly without 

scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000) scv.pp.moments(adata, n_pcs=30, n_neighbors=30)



It went through but got errors at `scv.pl.velocity_embedding_stream(bdata, basis='umap')`

> computing velocity embedding
>     finished (0:00:09) --> added
>     'velocity_umap', embedded velocity vectors (adata.obsm)
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> <ipython-input-56-948e59a039de> in <module>
> ----> 1 scv.pl.velocity_embedding_stream(bdata, basis='umap')
> 
> ~/anaconda3/envs/myenv/lib/python3.7/site-packages/scvelo/plotting/velocity_embedding_stream.py in velocity_embedding_stream(adata, basis, vkey, density, smooth, min_mass, cutoff_perc, arrow_color, linewidth, n_neighbors, recompute, color, use_raw, layer, color_map, colorbar, palette, size, alpha, perc, X, V, X_grid, V_grid, sort_order, groups, components, legend_loc, legend_fontsize, legend_fontweight, xlabel, ylabel, title, fontsize, figsize, dpi, frameon, show, save, ax, ncols, **kwargs)
>      70         X_grid, V_grid = compute_velocity_on_grid(X_emb=X_emb, V_emb=V_emb, density=1, smooth=smooth, min_mass=min_mass,
>      71                                                   n_neighbors=n_neighbors, autoscale=False, adjust_for_stream=True,
> ---> 72                                                   cutoff_perc=cutoff_perc)
>      73         lengths = np.sqrt((V_grid ** 2).sum(0))
>      74         linewidth = 1 if linewidth is None else linewidth
> 
> ~/anaconda3/envs/myenv/lib/python3.7/site-packages/scvelo/plotting/velocity_embedding_grid.py in compute_velocity_on_grid(X_emb, V_emb, density, smooth, n_neighbors, min_mass, autoscale, adjust_for_stream, cutoff_perc)
>      27     grs = []
>      28     for dim_i in range(n_dim):
> ---> 29         m, M = np.min(X_emb[:, dim_i]), np.max(X_emb[:, dim_i])
>      30         m = m - .01 * np.abs(M - m)
>      31         M = M + .01 * np.abs(M - m)
> 
> <__array_function__ internals> in amin(*args, **kwargs)
> 
> ~/.local/lib/python3.7/site-packages/numpy-1.17.0-py3.7-linux-x86_64.egg/numpy/core/fromnumeric.py in amin(a, axis, out, keepdims, initial, where)
>    2744     """
>    2745     return _wrapreduction(a, np.minimum, 'min', axis, None, out,
> -> 2746                           keepdims=keepdims, initial=initial, where=where)
>    2747 
>    2748 
> 
> ~/.local/lib/python3.7/site-packages/numpy-1.17.0-py3.7-linux-x86_64.egg/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
>      88                 return reduction(axis=axis, out=out, **passkwargs)
>      89 
> ---> 90     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
>      91 
>      92 
> 
> ValueError: zero-size array to reduction operation minimum which has no identity
> 
stefanpeidli commented 4 years ago
  1. Negative values in bdata.X should not be a problem, though negative values in layers should definetly not be present. Could you check that?
  2. This worked for me actually. What error did you have with this method?
  3. The error message indicates that you bdata object is messed up in that case. In any case filtering and normalization is always advised.
MichaelPeibo commented 4 years ago

Hi @stefanpeidli Thanks for your quick replies! Finally, I worked it out by calculating all cells' velocity and subsetting by index, rather than subsetting at the beginning. It is really annoying with anndata's subsetting......

A analysis question: why my terminal cell got some reverse arrows(right part of figure), I have to admit that my terminal population can be a little different with it's previous population(change of culture condition), does it look OK to you ?

image

stefanpeidli commented 4 years ago

That is really hard to judge purely from looking at the velocity stream plots. In case you find reversed velocities, you might want to have a look at Issue #112. We are currently working on a solution to that.

MichaelPeibo commented 4 years ago

Thanks @stefanpeidli I will check on it.