linnarsson-lab / BackSPIN

Biclustering algorithm
BSD 2-Clause "Simplified" License
55 stars 21 forks source link

Parameters adaptation for clustering #7

Open kokonech opened 7 years ago

kokonech commented 7 years ago

I am working on SmartSeq C1 scRNA-seq data, ~200 cells. The clustering using PCA+kmeans on filtered and CPM normalized counts demonstrates confident gene markers expression in certain clusters. However, trying to perform BackSPIN on the same data with 2000 genes selection gives rather distributed clustering without marker gene confirmation. It's also possible to see from tSNE plot with cluster color markers. Why could this happen? Is there any specific parameters adaptation for BackSPIN to improve precision? For example based on number of cells?

slinnarsson commented 7 years ago

Is your data from the standard C1 protocol, e.g. SMART-seq? If so, you will have reads (or normalized reads, e.g. RPKM), but BackSPIN was optimized to work on absolute count data as you would get from a method that uses UMIs. We have never tested on RPKM type data, and it's possible it won't work very well. Particularly the gene selection will probably not work.

kokonech commented 7 years ago

Thanks for reply! Yep, the protocol is a standard SMART-seq without UMIs. I tried changing the counts normalization (without/size factor/...), but did not get any major changes. I suppose the issue can be closed then. Also, it would be nice if the optimization of BackSPIN for the data with UMIs will be mentioned in README or documentation.