Downsampling of cells after filtering

ktrns / scrnaseq

Workflow for single-cell RNA-seq analysis using Seurat

MIT License

37 stars 15 forks source link

Downsampling of cells after filtering #96

Closed kosankem closed 2 years ago

kosankem commented 3 years ago

Das Donwsampling würden wir gerne nicht nur zu Testzwecken, sondern als festen Bestandteil in das Skript integrieren, um die Anzahl der Zellen pro Sample an einander anpassen zu können und im Folgenden mit den identischen Zellzahlen weiterarbeiten zu können. Hierfür wäre es dann sinnvoller, das Downsampling hinter das Filtern zu verschieben.

ktrns commented 3 years ago

We can discuss this. Intuitively, I wonder why you would want to do that? You'd be throwing away so much data...

kosankem commented 3 years ago

It was a matter / request in a recent project to adjust the cell numbers so that two samples (control and treatment) contained the same amount of cells. Bascially, we were downsampling the sample with the higher cell number to the level of the other one with the lower cell number for better comparability in all the visualisations.

ktrns commented 3 years ago

I understand. But I wouldn't do this for all projects to be honest. It feels like "throwing money out of the window" to phrase it in Denglish. I have had projects were samples were quite unequal, on purpose, but I think especially for 10x and the high cell numbers, this is just fine statistically speaking. I am not a statistician though.

@fabianrost84 - Would you know this?

fbnrst commented 3 years ago

I'm definitely against this as a standard behavior. If you have a rare sub population in one sample, you would loose it. Statistical tests in differential expression take cell numbers into account and would loose power by throwing away cells. It could really only be useful for UMAP-like visualizations, so, I would only do it for visualisation, if needed.

ktrns commented 3 years ago

Yes, this was also my belly feeling. Thanks for your input. @kosankem is this ok for you? If so, I'd close the issue.

Oliver-D-B commented 3 years ago

Thanks for the information and we understand your reasoning. We have discussed the aspect again. However, our idea was to implement this functionality as an option not necessarily as a mandatory standard. We have already had a substantial number of requests from customers who asked for an exact adjustment of cell numbers among samples (some reviewers might also ask). There are some visualization panels still present in the current report that seem easier (and more fair) to interpret if cell numbers would be identical. How about, if Maike would define an additional respective parameter and would implement it in the above mentioned sense after filtering? For users of the script who don’t like this idea or for such projects where this adjustment seems inappropriate, there would be nothing to do and nothing would change unless this new functionality would be actively set as ‘true’?

ktrns commented 2 years ago

Dear @Oliver-D-B and @kosankem, @fabianrost84 and @andpet0101,

We currently have the option to downsample before filtering:

# Downsample cells if requested
if (!is.null(param$downsample_cells_n)) {
  sc = purrr::map(sc, function(s) {
    cells = ScSampleCells(sc=s, n=param$downsample_cells_n, seed=1)
    return(subset(s, cells=cells))
  })
}

I understand your reasoning. Shall we add another parameter to allow for downsampling after filtering to the minimum amount of cells in any of the remaining samples, e.g. downsample_cells_equally?

kosankem commented 2 years ago

Yes, such a feature would be helpful for us in some situations. Thanks.