openpipelines-bio / openpipeline

https://openpipelines.bio
MIT License
24 stars 11 forks source link

Add ability to override axis for clr normalization #752

Open ddemaeyer opened 1 month ago

ddemaeyer commented 1 month ago

https://github.com/openpipelines-bio/openpipeline/blob/b39293c5333cb2725902d9a17e0d7ada17a6d4fc/src/transform/clr/script.py#L25

After checking out discussions on: https://github.com/scverse/muon/pull/28 https://github.com/satijalab/seurat/issues/3605

Assuming` that margin=1 means "perform CLR normalization for each feature independently", I am trying to follow the logic in your answer above:

It seems that you are saying that when you have small antibody panels you can sequence to saturation, so you get a more accurate “count”. So if you see a feature has a raw count of 10 in cell A but a count of 100 in cell B, it follows that cell B has higher levels of the feature than cell A. You can then use CLR with margin=1. A transformed value of 0 then means that a cell expresses the geometric mean of that feature. This helps to put all features on the same-ish scale.

But when you have a lot of different antibodies in one experiment, you can’t sequence to saturation and so that leads to more variation in ADT depth between cells. I think the logic here is that cell A might have a geometric mean of ADT counts of 10, whereas cell B might have a geometric mean of ADT counts of 10,000. If you see a feature has a raw count of 10 in cell A but a value of 100 in cell B, it doesn't necessarily follow that cell B has higher levels of the feature than cell A. It makes sense to me that you would want to first adjust each cell independently (margin=2) to control for the cell-specific geometric mean. If you tried to use margin=1, you would be ignoring the differences in ADT depth from cell to cell.

I hope I am interpreting correctly. Thank so much for your help

Add the option to override the clr axis for the users