HTO normalisation advice

ktrns / scrnaseq

Workflow for single-cell RNA-seq analysis using Seurat

MIT License

37 stars 15 forks source link

HTO normalisation advice #56

Closed andpet0101 closed 3 years ago

andpet0101 commented 4 years ago

Here is some advice about choosing normalisation for HTOs. Can we include this in the script?

From: https://github.com/satijalab/seurat/issues/2954

Thanks, these are good questions

We advise normalizing across tags particualrly when there is substantial variation in how well each hash performs. We saw this a lot when we were doing our own conjugations (like the original hashing paper)

We advise normalizing across cells in most other cases

ktrns commented 3 years ago

Re-read and decide whether or not we add this to the documentation.

ktrns commented 3 years ago

margin | If performing CLR normalization, normalize across features (1) or cells (2)

ktrns commented 3 years ago

I am not sure, if I am getting it right yet.

The default is margin=1. In this case, a method="CLR" normalisation of HTO counts is done across HTOs and per cell. That is for one cell, all HTO counts are centered and log-transformed. This is the respective piece of code from NormalizeData.default:

     'CLR' = CustomNormalize(
        data = object,
        custom_function = function(x) {
          return(log1p(x = x / (exp(x = sum(log1p(x = x[x > 0]), na.rm = TRUE) / length(x = x)))))
        },

In the linked issue, the satijalab advised to use margin=1 if there is variation in how HTOs perform (which is what they observed).

ktrns commented 3 years ago

So what is the exact difference to method="LogNormalize"? Is it that "CLR" calculates a geometric mean and log-transforms the data, while "LogNormalize" calculates a standard mean, scales and log-transforms the data?

I assumed we normalise per cell as well (and we actually see this), but the code also has this snippet:

    if (normalization.method != 'CLR') {
      margin <- 2
    }

ktrns commented 3 years ago

So in summary, I'd like to understand whether method="LogNormalize" is on margin=1, and think we can write a sentence for each method to the comments in the HTO script. Also, we can mention that margin=2 is an option that might be worth to look at if HTO's perform similarly.

If we can clear up the margin question, I can add the text to the comments in the script.

andpet0101 commented 3 years ago

According to the documentation of NormalizeData the option margin= applies only if you use method=CLR.

In this issue - https://github.com/satijalab/seurat/issues/2296 - it is documented that the CLR normalisation was used in the original CITE-seq paper. Therefore I would opt to use method="CLR" from now on. Depending one how the HTOs perform on could use either margin=1 (across features, default, applies only to CLR) or margin=2 (across cells, for substantial variation, but still CLR).

andpet0101 commented 3 years ago

Here is some more argumentation pro CLR: https://github.com/satijalab/seurat/issues/3550

andpet0101 commented 3 years ago

And I have finally found a dataset where it really makes a difference and likely because there is one good hashtag and two not so good hashtags. I can give you the ID...

ktrns commented 3 years ago

I will change this now (on the master since it is quite minor), thanks for the input.

ktrns commented 3 years ago

Closed with PR Merge #93