Open Zifeng-L opened 4 years ago
Hey! Great that this repo is being used before we have started the outline for the community edition of best practices :).
I think the opinions are split on scaling at the moment. Scanpy's initial tutorial followed Seurat's tutorial and thus performed scaling. There was no separate evaluation of what should be done on the side of Scanpy. I would gather that the arguments for and against scaling are:
For: Equal contribution of all genes to PCA or other dimensionality reduction method. Against: Expression level of a gene is indicative of its relative importance.
I'm not sure whether scaling improves the signal-to-noise ratio or not. This is yet to be shown as far as I am aware. In the tutorial I didn't perform scaling as I felt that using an equal weighting for all genes hides some biological signal. Other tutorials, such as that in Slingshot also don't perform scaling.
Basically, there is no best practice suggestion on scaling yes/no therefore it is optional at the moment. If you are keen, you could start a test on this and maybe we could come up with a recommendation?
hi here, in your tutorial, you did normalize and batch correct before pca. However in most tutorial like seurat and scanpy, we did scale data before find HVGs and pca. As we all know, PCA or any other type of analysis will be dominated by highly expressed genes with high variance. Scaling data can improved signal to noise ratio. I just want to know whether scaling is necessary or selectable. Thanks!