Closed sophietr closed 1 year ago
>>> adata1 = AnnData(np.array([[1, 2, 3], [4, 5, 6]]),
>>> {'smp_names': ['s1', 's2'],
>>> 'anno1': ['c1', 'c2']},
>>> {'var_names': ['a', 'b', 'c']})
>>> adata2 = AnnData(np.array([[1, 2, 3], [4, 5, 6]]),
>>> {'smp_names': ['s3', 's4'],
>>> 'anno1': ['c3', 'c4']},
>>> {'var_names': ['b', 'c', 'd']})
>>> adata3 = AnnData(np.array([[1, 2, 3], [4, 5, 6]]),
>>> {'smp_names': ['s5', 's6'],
>>> 'anno2': ['d3', 'd4']},
>>> {'var_names': ['b', 'c', 'd']})
>>>
>>> adata = adata1.concatenate([adata2, adata3])
>>> adata.X
[[ 2. 3.]
[ 5. 6.]
[ 1. 2.]
[ 4. 5.]
[ 1. 2.]
[ 4. 5.]]
>>> adata.smp
anno1 anno2 batch
s1 c1 NaN 0
s2 c2 NaN 0
s3 c3 NaN 1
s4 c4 NaN 1
s5 NaN d3 2
s6 NaN d4 2
hi, i converted the list of requests to a checkmark list.
now we can check off finished items
Hi all!
I just wanted to jump in with @sophietr and say that implementing a cell cycle classification function like Seurat's CellCycleScoring function would be a nice addition to the preprocessing options. Would be valuable to keep an eye on in downstream exploration and could then be easily regressed out if needed.
Also, do you guys have any opinions about the inclusion of imputation/smoothing strategies? I've been messing around with including it in analysis pipelines, but still haven't really settled on when to include them. If there's interest, MAGIC seems like a great option and is currently implemented in Python.
@falexwolf where would you expect a new scoring function? Under tools (sc.tl) or preprocessing (sc.pp)? I may contribute to this cell cycle thing if you haven't already
@dawe a cell cycle scoring function would be great! everything that's a bit more extensive and non-standard should go into sc.tl, everything that's really just simple preprocessing and stats with a few lines can go to sc.pp. usually, there should be a plotting function in sc.pl that presents a canonical visualization of the annotation added in with the tool... writing a test for your function would also be great ;)
@dpcook @flying-sheep regarding imputation: my personal view is that nothing is settled there - I can't judge what's a good method and what not and whether one should use it at all. so for now, we wil not include any imputation method in scanpy. but it's easy to just apply any package you like to the data matrix in an AnnData object adata.X
...
@falexwolf I have the functions in my scanpy branch, right now. It seems to be properly working (take a look, if you want to). I'll add the tests as soon as possibile (now getting back to "ordinary work")
@flying-sheep can you cite a reference for scImpute and countae outperforming MAGIC? I'd be curious to learn which hyperparameter optimization methods and performance measures were used in the benchmark.
well, @gokceneraslan told me. Gökcen, is the preprint for countae online? It should contain what @hammer asked for, right?
Yes, it'll be out soon. It's very difficult to compare imputation/denoising methods, but we have some in the paper.
@flying-sheep @gokceneraslan great! I agree it's hard to compare these algorithms as the performance of an imputation strategy often depends on the downstream use case. I'm looking forward to checking out the countae preprint. I find the scVI benchmark of imputation methods to be useful for now.
I'm closing this because it's a bit of a very old catch all issue. If somebody in this thread is still lacking functionality in the broader scverse ecosystem or scanpy directly, I'd encourage people to post new issues to discuss specifics.
hi Alex here my list :) thanks a lot! and I might expand it....
'needed'
'nice to have'