Quality control comments/suggestions

lazappi commented 8 months ago

Some comments/suggestions after looking at the quality control chapter

[x] There is a section called "Filtering low quality reads" which I think is about filtering cells not reads
[ ] The part where the mitochondrial genes are selected could be expanded
- Explain that this pattern is only one way to identify MT genes
- Explain what the columns added to adata contain
- I'm not sure why the ribosomal and hemoglobin sets are included here. They aren't mentioned in the text or used for filtering and I don't think this is standard practice.
[ ] The is_outlier() function removes both low and high outliers, I think in most cases you probably only want one of those (depending on the metric)
[ ] The filtering is done directly using this function without any checking of the selected thresholds. It would be great to show where they are on the plots to validate them. I like this data-guided way of choosing thresholds but it doesn't always work.
[ ] Outlier detection doesn't work great on the raw MT percentages, it's better if you do a logit transformation first (like we do it on log-transformed values for the other metrics)
[ ] Ambient correction isn't a common/standard processing step but that isn't described in the text which I think confuses new people. I would try to make this clearer and/or move this to a different chapter.
[ ] There are a few references to other chapters which could be linked (probably they didn't exist when this was written)
[ ] I noticed a few minor typos etc.

Zethson commented 8 months ago

Thank you @lazappi !

Zethson commented 8 months ago

Converted to a TODO list and fixed the read issue. Generally, we don't distinguish reads and cells here well enough although they are two very different concepts. I'll look at this again

theislab / single-cell-best-practices

Quality control comments/suggestions #248