theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
412 stars 102 forks source link

scv.pp.moment error #1

Closed juugii closed 6 years ago

juugii commented 6 years ago

Hello,

First try with this testing release. I cloned the git repo, and installed with pip install . on python 3.6.5 I loaded an annotated dataframe I am using to work on (created/pp with scanpy), then:

>>> scv.pp.moments(adata)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/path/to/scvelo/scvelo/preprocessing/moments.py", line 42, in moments
    adata.layers['Ms'] = csr_matrix.dot(connectivities, adata.layers['spliced']).toarray()
  File "/path/to/local/python/python-dev/lib/python3.6/site-packages/anndata/layers.py", line 41, in __getitem__
    return self._layers[key]
KeyError: 'spliced'

Many thanks for your work, Best.

juugii commented 6 years ago

I probably did something wrong when providing the spliced/unspliced informations. What is the recommended way to create an anndata containing it? Should I first read and pp two matrices (unspliced/spliced) independently?

VolkerBergen commented 6 years ago

Hi juugii, seems like it didn't load spliced and unspliced count matrices into the AnnData object. sc.read(filename) would do that automatically, when it is contained in the loom file. These are then provided in adata.layers.

In terms of pp, the filtering automatically affects the layers. For pp.normalize_per_cell(adata, layers='all') you would have to set layers='all' to indicate that it should also normalize the spliced and unspliced count matrices.

juugii commented 6 years ago

Ok I was trying to build an anndata on my own. I did try with a loom object, but I am not able to load it.

adata=sc.read('sample01.loom', cache=True)

Traceback (most recent call last): File "", line 1, in File "/path/to/local/python/python-dev/lib/python3.6/site-packages/scanpy/readwrite.py", line 85, in read .format(filekey, filename, avail_exts)) ValueError: Reading with filekey "sample01.loom" failed, the inferred filename "./write/sample01.loom.h5ad" does not exist. If you intended to provide a filename, either use a filename ending on one of the available extensions {'h5ad', 'mtx', 'xlsx', 'anndata', 'txt', 'soft.gz', 'csv', 'data', 'tsv', 'tab', 'h5'} or pass the parameter ext.

scanpy doesn't seems to recognize loom extensions.

I am running: scanpy==1.2.2 anndata==0.6.9 numpy==1.14.3 scipy==1.1.0 pandas==0.23.0 scikit-learn==0.19.1 statsmodels==0.9.0

My scanpy version fits the scvelo requirements from the github repo.

VolkerBergen commented 6 years ago

Sorry for the confusion.

We had to modify scanpy accordingly such as handling loom files. The next scanpy release will be within the next few days. For now, just install the latest Github version of scanpy. It should work then.

Thanks for mentioning. I will fix the requirements.

juugii commented 6 years ago

No problem, I understand all of this is in active development. Updating from scanpy repo did work indeed.

Regarding to my OP, I'll try to add layers to the annData object from spliced/unspliced separated informations, then perform normalization in a second time. So for now the recommended way to use scvelo is only loading loom files?

Also, please note that velocity_embedding_grid seems to work correctly, though I had to print the plot "manually", while I haven't been able to use velocity_embedding:

scv.pl.velocity_embedding_grid(adata, basis='umap') <matplotlib.axes._subplots.AxesSubplot object at 0x7fb58f2d9f28> plt.show()

scv.pl.velocity_embedding(adata, basis='umap') Traceback (most recent call last): File "", line 1, in File "/path/to/local/python/python-dev/lib/python3.6/site-packages/scvelo/plotting/velocity_embedding.py", line 53, in velocity_embedding X, V, C = X_emb[ix_choice], V_emb[ix_choice], interpret_colorkey(adata, color, layer, perc=[2, 98])[ix_choice] TypeError: only integer scalar arrays can be converted to a scalar index

(not sure if it requires posting in a separated issue)

VolkerBergen commented 6 years ago

Glad to hear that.

I mainly focused on reading loom files as perhaps the most common way to get spliced/unspliced counts right now is to generate a loom file via velocyto.

When doing it manually, you would just generate the AnnData object with something along these lines:

adata = sc.AnnData(S) adata.layers['spliced'] = S.tocsr() adata.layers['unspliced'] = U.tocsr()

for S and U representing the spliced/unspliced count matrices.

I just updated an exemplary notebook, which might be worth looking into: https://github.com/theislab/scvelo_notebooks/blob/master/DentateGyrus.ipynb

Considering the plotting, I'll get into it.

falexwolf commented 6 years ago

Wouldn't here, you rather do:

adata = sc.AnnData(S.tocsr()+U.tocsr())  # X stores the added counts
adata.layers['spliced'] = S.tocsr()
adata.layers['unspliced'] = U.tocsr()
VolkerBergen commented 6 years ago

For downstream analysis you would consider mature mRNA (S) with coding-regions only which is translated to build the protein. That is also what you do for any basic single cell analysis. Newly transcribed mRNA (U) is only relevant for velocity estimation. Hence, adata.X would just be normalized logarithmized spliced abundances.

falexwolf commented 6 years ago

Thank you for the clarification! I see that it is less verbose, but shouldn't one then think about using .X for the spliced RNA? Instead of .layers['spliced']? It's a lot of memory that is used by duplicating S, isn't it? Another solution: set .X=None, which is valid.

VolkerBergen commented 6 years ago

.X contains the logarithmized normalized spliced counts used e.g. for computing neighbor graphs and embeddings, while .layers['spliced'] contains the non-logarithmized normalized spliced counts (keeping it comparable to unspliced counts). Indeed, we might be better off just applying np.log1p whenever needed instead of storing another count table. However, since batch correction and other preprocessing procedures should only affect .X as well, it is probably best to keep it as is.

The initially raised plotting issue (seemed to have been lost in the course of conversation) is still on my to-do list.

VolkerBergen commented 6 years ago

@juugii I would greatly appreciate to hear whether the plotting issue is solved now.

juugii commented 6 years ago

Hi @VolkerBergen, Sorry for my late answer. I have been able to use the plotting functions, yet I have to use plt.show() as explained. But it is functional, and gives expected results. I have been able to work with loom files after upgrading to the latest scanpy. I'm also building on creating an AnnData according to your recommendations, but it is outside the scope of this OP, I can close it. Thanks for your help.