nolanlab / citrus

Citrus Development Code
GNU General Public License v3.0
31 stars 20 forks source link

Comparison not including Citrus #96

Closed SamGG closed 8 years ago

SamGG commented 8 years ago

Hi Robert, If you have time to read Comparison of Clustering Methods for High-Dimensional Single-Cell Flow and Mass Cytometry Data Lukas M Weber, Mark D Robinson doi: http://dx.doi.org/10.1101/047613

rbruggner commented 8 years ago

Oh, interesting. I'll take a look and will comment. Thank you for the ref.

On Apr 9, 2016, at 12:24 PM, SamGG notifications@github.com wrote:

Hi Robert, If you have time to read Comparison of Clustering Methods for High-Dimensional Single-Cell Flow and Mass Cytometry Data Lukas M Weber, Mark D Robinson doi: http://dx.doi.org/10.1101/047613

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

rbruggner commented 7 years ago

Just had a look, and added the following comment:

Very cool comparison guys! Nice to see this sort of rigor and systematic approach to this problem.

A couple of thoughts that probably don't change your conclusions substantially:

FWIW, the "Wards" linkage method as implemented in Rclusterpp runs much faster and produces better clustering results (than the average / euclidian combo) by my previous evaluations. I just clustered all events in the 2015 Levine Marrow 32 dataset on my 2013 Macbook (4 cores) in about 25 minutes.

The "standard" hierarchical clustering methods that Rclusterpp implements are not tuned or designed specifically for flow cytometry data + the population identification problem and I believe many of the methods in your comparison are. In some sense, this sort of presents an "apples to oranges" comparison situation. At the very minimum though, it is a really nice demonstration of how much better domain-tuned algorithms perform relative to generalized clustering algorithms. In that respect, I think there are some useful adaptations of hierarchical clustering that make it better suited to flow cytometry work, but obviously I'm biased :)

And specifically, the adaptation that I'm referring to is using to use all the clusters identified in the clustering hierarchy rather than cutting the tree at a fixed point and using the terminal clusters. But again, I'm biased because ultimately, I don't think that re-identify human gated populations should really be the end goal of of these methods.

Thank you for pointing me to the paper!