Spade clusters number - Githubissues

joanqcflow commented 7 years ago

Hello!

I am a new user of Spade and i'm processing some classical flow cytometry data (13 parameters) into Spade. I am able to generate trees but the choice of the number of clusters is still confusing me. What is the right manner to choose this number. Is there any risk to under- or over-estimate this number ?

Thank you for your help and answers.

Joan

zbjornson commented 7 years ago

Hi Joan -

It's a bit of trial-and-error/compromise. A lot of people want one cell type per cluster, which is basically impossible to achieve. With too few clusters, you will have a mix of different cell types in each cluster, which is decidedly bad. With too many clusters, it's sort of more difficult to look at but the clusters are more likely to be "pure." Thus, err on the side of too many clusters. The best number is specific to your dataset (e.g. a cell line versus depleted/enriched blood versus whole blood). The default in most SPADE applications is 200 clusters and does pretty well for many datasets, but 50 to 150 may work better. Try a few...

Hope that helps.

joanqcflow commented 7 years ago

Hi Zach!

Thanks a lot ! It really clarifies the issue. I'll try different combinations.

Cheers

nolanlab / spade

Spade clusters number #132