Closed juugii closed 6 years ago
I probably did something wrong when providing the spliced/unspliced informations. What is the recommended way to create an anndata containing it? Should I first read and pp two matrices (unspliced/spliced) independently?
Hi juugii,
seems like it didn't load spliced and unspliced count matrices into the AnnData
object.
sc.read(filename)
would do that automatically, when it is contained in the loom file. These are then provided in adata.layers
.
In terms of pp, the filtering automatically affects the layers. For pp.normalize_per_cell(adata, layers='all')
you would have to set layers='all'
to indicate that it should also normalize the spliced and unspliced count matrices.
Ok I was trying to build an anndata on my own. I did try with a loom object, but I am not able to load it.
adata=sc.read('sample01.loom', cache=True)
Traceback (most recent call last):
File "ext
.
scanpy doesn't seems to recognize loom extensions.
I am running: scanpy==1.2.2 anndata==0.6.9 numpy==1.14.3 scipy==1.1.0 pandas==0.23.0 scikit-learn==0.19.1 statsmodels==0.9.0
My scanpy version fits the scvelo requirements from the github repo.
Sorry for the confusion.
We had to modify scanpy accordingly such as handling loom files. The next scanpy release will be within the next few days. For now, just install the latest Github version of scanpy. It should work then.
Thanks for mentioning. I will fix the requirements.
No problem, I understand all of this is in active development. Updating from scanpy repo did work indeed.
Regarding to my OP, I'll try to add layers to the annData object from spliced/unspliced separated informations, then perform normalization in a second time. So for now the recommended way to use scvelo is only loading loom files?
Also, please note that velocity_embedding_grid seems to work correctly, though I had to print the plot "manually", while I haven't been able to use velocity_embedding:
scv.pl.velocity_embedding_grid(adata, basis='umap') <matplotlib.axes._subplots.AxesSubplot object at 0x7fb58f2d9f28> plt.show()
scv.pl.velocity_embedding(adata, basis='umap') Traceback (most recent call last): File "
", line 1, in File "/path/to/local/python/python-dev/lib/python3.6/site-packages/scvelo/plotting/velocity_embedding.py", line 53, in velocity_embedding X, V, C = X_emb[ix_choice], V_emb[ix_choice], interpret_colorkey(adata, color, layer, perc=[2, 98])[ix_choice] TypeError: only integer scalar arrays can be converted to a scalar index
(not sure if it requires posting in a separated issue)
Glad to hear that.
I mainly focused on reading loom files as perhaps the most common way to get spliced/unspliced counts right now is to generate a loom file via velocyto.
When doing it manually, you would just generate the AnnData object with something along these lines:
adata = sc.AnnData(S)
adata.layers['spliced'] = S.tocsr()
adata.layers['unspliced'] = U.tocsr()
for S and U representing the spliced/unspliced count matrices.
I just updated an exemplary notebook, which might be worth looking into: https://github.com/theislab/scvelo_notebooks/blob/master/DentateGyrus.ipynb
Considering the plotting, I'll get into it.
Wouldn't here, you rather do:
adata = sc.AnnData(S.tocsr()+U.tocsr()) # X stores the added counts
adata.layers['spliced'] = S.tocsr()
adata.layers['unspliced'] = U.tocsr()
For downstream analysis you would consider mature mRNA (S) with coding-regions only which is translated to build the protein. That is also what you do for any basic single cell analysis. Newly transcribed mRNA (U) is only relevant for velocity estimation. Hence, adata.X
would just be normalized logarithmized spliced abundances.
Thank you for the clarification! I see that it is less verbose, but shouldn't one then think about using .X
for the spliced RNA? Instead of .layers['spliced']
? It's a lot of memory that is used by duplicating S
, isn't it? Another solution: set .X=None
, which is valid.
.X
contains the logarithmized normalized spliced counts used e.g. for computing neighbor graphs and embeddings, while .layers['spliced']
contains the non-logarithmized normalized spliced counts (keeping it comparable to unspliced counts). Indeed, we might be better off just applying np.log1p
whenever needed instead of storing another count table. However, since batch correction and other preprocessing procedures should only affect .X
as well, it is probably best to keep it as is.
The initially raised plotting issue (seemed to have been lost in the course of conversation) is still on my to-do list.
@juugii I would greatly appreciate to hear whether the plotting issue is solved now.
Hi @VolkerBergen, Sorry for my late answer. I have been able to use the plotting functions, yet I have to use plt.show() as explained. But it is functional, and gives expected results. I have been able to work with loom files after upgrading to the latest scanpy. I'm also building on creating an AnnData according to your recommendations, but it is outside the scope of this OP, I can close it. Thanks for your help.
Hello,
First try with this testing release. I cloned the git repo, and installed with pip install . on python 3.6.5 I loaded an annotated dataframe I am using to work on (created/pp with scanpy), then:
Many thanks for your work, Best.