Closed tlnagy closed 8 years ago
@martinkampmann What is a good range to test for the minimum number of cells per bin? I currently have 2x10^6 cells as the minimum with 500 genes and 5 guides per gene.
I would definitely simulate down to a pretty low coverage. The reason is that you're not only simulating the bottleneck based on the bin size, but also later bottlenecks (e.g. DNA lost during sample prep) that way. How about 25,000 cells as the minimum (corresponding to 10-fold representation) and 2.5 million cells as the maximum (corresponding to 1,000-fold representation)
My initial choice was really similar, I picked 2e4 to 2e6 as my range, but using 2.5 is better because it maps onto the number of guides better.
Example heatmap with the mean auroc computed across 10 runs.
@martinkampmann Here are the plots for the 3 different noise levels. There is a straightforward trend that as the noise of the readout increases, the importance of the minimum number of cells per bin rises, while the dependence of the auroc on representation is roughly the same across noise levels.
This issue is superseded by #29
This issue will track some of the ideas we have for figures/analysis