Open LuckyMD opened 5 years ago
I think that makes sense to allow keeping a measure of magnitude, potentially implemented as an option, like with sc.pp.scale
s zero_center
.
I'd be interested to see how different highly variable gene selection was on data transformed this way, vs the batched approach we have now.
I wasn't aware that you could run sc.pp.scale
without obtaining mean 0 at the end. Would that just scale the variance per gene then?
As for your question on HVG selection after sc.pp.regress_out
vs in batches... I think that's an interesting question, but I reckon the two scenarios are actually not that related. I normally wouldn't use sc.pp.regress_out
to remove batch effects, but rather to regress out continuous covariates like cell cycle scores. Batch effect removal is probably best done with methods that account for the variance contribution of the batch effect as well, such as Combat... or more complex data integration methods (Seuart, MNN, scanorama). Either way, it would be an interesting comparison... just with a caveat ^^.
Hi @LuckyMD , Any updates regarding this issue? I am fairly new to scanpy and I am working on implementing regress_out() and finding HVG in the best way possible. I keep wondering whether or not I should regress out and scale before or after finding HVG. Any tips/updates? Everything is welcome :)
Hi all,
I've been wondering about this for a while. As
sc.pp.regress_out
only leaves residuals, the resulting expression values have 0 mean. Thus, you can no longer usesc.pp.highly_variable
afterwards (it bins by mean expression value per gene). This seems like a bad idea. An easy fix would be to also keep the intercept value and not only the residuals fromsc.pp.regress_out
. What do you guys think?If this sounds like a good idea to you, I will put it on my todo list for a pull request.