Some comments/suggestions after looking at the quality control chapter
[x] There is a section called "Filtering low quality reads" which I think is about filtering cells not reads
[ ] The part where the mitochondrial genes are selected could be expanded
Explain that this pattern is only one way to identify MT genes
Explain what the columns added to adata contain
I'm not sure why the ribosomal and hemoglobin sets are included here. They aren't mentioned in the text or used for filtering and I don't think this is standard practice.
[ ] The is_outlier() function removes both low and high outliers, I think in most cases you probably only want one of those (depending on the metric)
[ ] The filtering is done directly using this function without any checking of the selected thresholds. It would be great to show where they are on the plots to validate them. I like this data-guided way of choosing thresholds but it doesn't always work.
[ ] Outlier detection doesn't work great on the raw MT percentages, it's better if you do a logit transformation first (like we do it on log-transformed values for the other metrics)
[ ] Ambient correction isn't a common/standard processing step but that isn't described in the text which I think confuses new people. I would try to make this clearer and/or move this to a different chapter.
[ ] There are a few references to other chapters which could be linked (probably they didn't exist when this was written)
Converted to a TODO list and fixed the read issue. Generally, we don't distinguish reads and cells here well enough although they are two very different concepts. I'll look at this again
Some comments/suggestions after looking at the quality control chapter
adata
containis_outlier()
function removes both low and high outliers, I think in most cases you probably only want one of those (depending on the metric)