Closed adamgayoso closed 1 year ago
Agreed that 10k would be better. I recall a recent paper (from Lior?) arguing something like this as well. Either way, I'd prefer scran, but that's slow... Do you want to open a PR?
I think this happened because for some reason it was included in the dimensionality reduction task and then transferred to everything when we made these generic functions. Always seemed weird to me as well so happy to have it replaced with something more standard.
We have log_scran implemented in utils elsewhere as well... but that might make this task a lot slower. 10k is fine.
I'm fine with this change. Worth noting that the PR would have to change many text references to CPM in method names, function names and the like.
https://github.com/openproblems-bio/openproblems/blob/3d8964a6c02496c0c604f0b1ddadc40589ca43a8/openproblems/tools/normalize.py#L44-L49
It's much more standard to use CP10k, or counts per median lib size. CPM might distort 0s vs non0s heavily.