Closed andpet0101 closed 3 years ago
Re-read and decide whether or not we add this to the documentation.
margin | If performing CLR normalization, normalize across features (1) or cells (2)
I am not sure, if I am getting it right yet.
The default is margin=1
. In this case, a method="CLR"
normalisation of HTO counts is done across HTOs and per cell. That is for one cell, all HTO counts are centered and log-transformed. This is the respective piece of code from NormalizeData.default
:
'CLR' = CustomNormalize(
data = object,
custom_function = function(x) {
return(log1p(x = x / (exp(x = sum(log1p(x = x[x > 0]), na.rm = TRUE) / length(x = x)))))
},
In the linked issue, the satijalab advised to use margin=1
if there is variation in how HTOs perform (which is what they observed).
So what is the exact difference to method="LogNormalize"
? Is it that "CLR" calculates a geometric mean and log-transforms the data, while "LogNormalize" calculates a standard mean, scales and log-transforms the data?
I assumed we normalise per cell as well (and we actually see this), but the code also has this snippet:
if (normalization.method != 'CLR') {
margin <- 2
}
So in summary, I'd like to understand whether method="LogNormalize"
is on margin=1
, and think we can write a sentence for each method to the comments in the HTO script. Also, we can mention that margin=2
is an option that might be worth to look at if HTO's perform similarly.
If we can clear up the margin question, I can add the text to the comments in the script.
According to the documentation of NormalizeData
the option margin=
applies only if you use method=CLR
.
In this issue - https://github.com/satijalab/seurat/issues/2296 - it is documented that the CLR normalisation was used in the original CITE-seq paper. Therefore I would opt to use method="CLR"
from now on. Depending one how the HTOs perform on could use either margin=1
(across features, default, applies only to CLR) or margin=2
(across cells, for substantial variation, but still CLR).
Here is some more argumentation pro CLR: https://github.com/satijalab/seurat/issues/3550
And I have finally found a dataset where it really makes a difference and likely because there is one good hashtag and two not so good hashtags. I can give you the ID...
I will change this now (on the master since it is quite minor), thanks for the input.
Closed with PR Merge #93
Here is some advice about choosing normalisation for HTOs. Can we include this in the script?
From: https://github.com/satijalab/seurat/issues/2954
Thanks, these are good questions
We advise normalizing across tags particualrly when there is substantial variation in how well each hash performs. We saw this a lot when we were doing our own conjugations (like the original hashing paper)
We advise normalizing across cells in most other cases