vib-singlecell-nf / vsn-pipelines

A repository of pipelines for single-cell data in Nextflow DSL2
GNU General Public License v3.0
75 stars 31 forks source link

Using flavor='seurat_v3' for feature selection may give better results [SUGGESTION] #354

Open cbravo93 opened 3 years ago

cbravo93 commented 3 years ago

Is your feature request related to a problem? Please describe. For the same data set with same filters/cells I get much cleaner results with Seurat than with VSN. I think this is mostly due to the selection of variable features. I am not using anything fancy (e.g. sctransform), but seurat uses as default vst (I am using top 3000 features); while VSN still uses the mean_dispersion. VST is implemented in scanpy (https://github.com/theislab/scanpy/issues/993). Here (https://github.com/vib-singlecell-nf/vsn-pipelines/blob/master/src/scanpy/bin/feature_selection/sc_find_variable_genes.py) I can see a method parameter, but nothing is implemented apart from mean_disp. I attach UMAPs for comparison, I can add annotations if it would make it clearer (for VSN nPC was determined with pcacv to be 17; for seurat I just used 30 for a quick check, reducing to 17 does not change results a lot either, or increasing VSN to 30 either).

image

image

Describe the solution you'd like Would it be possible to add other methods? I think what I am looking for is flavor='seurat_v3' (https://scanpy.readthedocs.io/en/stable/generated/scanpy.pp.highly_variable_genes.html)

Describe alternatives you've considered I can just run Seurat instead, but I really like VSN (although now I am unsure if this could be happening in other data sets too)

cbravo93 commented 3 years ago

Another example where they differ:

VSN image

Seurat (colored by VSN clusters)

image

That distinction between Mol A/B not sure where it comes from, Seurat also agrees with pycistopic.

dweemx commented 3 years ago

Hi @cbravo93, thanks for the comprehensive report. This feature will be available in the next release i.e. v0.27.0 (which will be released soon)