niaid / dsb

Normalize CITEseq Data
Other
63 stars 13 forks source link

choice of graph to use for clustering unclear? #40

Closed naila53 closed 1 year ago

naila53 commented 1 year ago

Hi, I noticed that you made interesting different decisions regarding which graph to use for FindClusters function. In your tutorial, you use the knn graph when using direct dsb values, i know that by defualt snn graph goes into the clustering function, is there a reason why? image why not "dsb_wnn" ??

However, here you use the default snn as expected image

MattPM commented 1 year ago

Hi this is a bit outside the scope of this normalization package but I'll try to answer your question. For clustering with a small number of proteins like 30 I've noticed there is not much benefit to first compressing into PCs first. Sometimes in the tutorials I just show the default from Seurat for simplicity but you can see the little hack I made for clustering on ADTs directly in the tutorial. I've not benchmarked this but in early pilot analysis when we compress 30 proteins into 30 PCs it just added noise for some cells where they were clearly defined by a single protein. Data with hundreds of proteins might benefit more from compression into PCs before making the graph.