ncborcherding / scRepertoire

A toolkit for single-cell immune profiling
https://www.borch.dev/uploads/screpertoire/
MIT License
301 stars 49 forks source link

shannon index #206

Closed Sa753 closed 1 year ago

Sa753 commented 1 year ago

Dear Nick,

There are several ways to calculate the Shannon index but they affect the interpretation of the outcome, for example if we give a score 0-1, 0 would mean high diversity and 1 less diversity. In others, they use 1- entropy , which would mean the opposite interpretation . In your package, the high scores in Shannon index means high diversity or low diversity?.

I just want to add that I think , inverse Simpson is opposite of Shannon but in the package, we get similar results from both. I am using version 1.7 Thanks

ncborcherding commented 1 year ago

Hey Sa753,

Good question.

Shannon index in the package returns 𝐻′=−∑𝑖𝑝𝑖log𝑏𝑝𝑖 so the true Shannon-Weaver value. The higher the value, the greater the diversity. This is measuring the proportional abundance of species (here clonotypes) in the data.

Inv Simpson returns 1/𝐷 given that 𝐷=∑𝑝2𝑖. This is a metric for the effective number of types (here clonotypes) in the data.

There not truly the opposite of one another, in fact they should have relatively similar trend (but likely not the same values returned).

If you are getting strange results - more than likely this is due to the built in functions for downsampling and bootrapping clonalDiversity(). Specifically a small number of clonotypes in a sample/group can create issues. The function identifies the sample/group with the least number of unique clonotypes, randomly selects that number for every other group/sample, and calculates the diversity metrics. The function also does this repeatedly and takes the mean of all the boot straps.

Hope that helps clarify and thanks for the question, Nick

Sa753 commented 1 year ago

Thank you so much for your prompt reply.

One last question, does the inverse Pielou is also an index for clonal diversity or it is index for what?.

I know Pielou evenness so higher values means more even clones , i.e clonal expansion . is that correct.

Thanks so much

ncborcherding commented 1 year ago

Hey Sa753,

Yeah that is correct - Pielou is a modified version of shannon representing evenness of species in the system, where shannon index is divided by the maximum shannon index if the species were equal. Inverse Pielou was a specific request by a user and is generally inversely related to clonal expansion, but is not specifically measuring that.

Nick

Sa753 commented 1 year ago

Hi Nick,

I am sorry to reopen this issue but there is something that doesn't make sense.

If Shannon diversity index is high, this means that there is clonal diversity as you kindly highlighted above. If Pieolu is an evenness index so high Pielou means more clonal expansion as you also highlighted and in your package, it is inv Pielou so the lower the value, the more evenness and more clonal expansion. so the same dataset can't show high Shannon index and low inv Pielou as those two are opposite to each other.

By the way, I am probably the user who requested an evenness index or clonal expansion index so you added Inv Pielou, unless you meant Pielou and not Inverse Pielou. in this case, it makes sense that a dataset can have high Shannon and low Pielou not low INVERSE Pielou.

This is contradiction is even present with the dataset in your vignette so it is not a dataset ambiguous results.

Please clarify as this makes the data interpretation completely different

Thanks

ncborcherding commented 1 year ago

Hey Sa753,

I think there is some overall confusion here - please correct the following if I am wrong:

As Pielou is scaled version of shannon diversity (it is the shannon index divided by the maximum shannon index assuming eveness) - the inverse Pielou (1/Pielou) will be inversely associated with shannon index.

Nick

Sa753 commented 1 year ago

Hi Nick,

I understand that Shannon and Pielou are opposite indicators, Shannon is diversity index but Pielou is evenness index and those are opposite to each other so inverse Pielou (1/Pielou) should give similar results to Shannon which is what I meant.

It is as if you are saying that a repertoire is diverse (Shannon) and even(Pielou) which are opposite to each other.

So something is not exactly making sense.

Thanks

ncborcherding commented 1 year ago

Hey Sa753,

Shannon diversity and Pielou index are not opposite of one another - Pielous index is a derivation and uses shannon diversity as the numerator. The confusion is that Pielou is really measuring eveness by scaling the shannon index by the log(number of species). Thus you can't actually predict the direction of Pielou or Inverse Pielou as referenced by the shannon index as the denominator can modify the direction. Please see below an example of exactly that (using 2 tumor samples derived from the same patient in the utility data set)

Loading the contig Data

files <- list.files("./TCRS")
contig.list <- list()

for (i in seq_along(files)) {
  contig.list[[i]] <- read.csv(paste0("./TCRS/", files[i], "/filtered_contig_annotations.csv"))
}

Combing the contigs by barcode

library(scRepertoire)
combined.contigs <- combineTCR(contig.list, 
                               samples = c("HT2.1", "HT2.2"))

Convert the clonotypes into tables

table.contigs <- lapply(combined.contigs, function(x) {
  y <- as.data.frame(table(x[,"CTaa"]))
  y
})

Calculating Metrics

library(vegan)

shannon.values <- lapply(table.contigs, function(x) {
   sh <- diversity(x[,"Freq"], index = "shannon")
   sh
})

invsimpson.values <- lapply(table.contigs, function(x) {
  si <-diversity(x[,"Freq"], index = "invsimpson")
  si
})

pielou.values <- lapply(table.contigs, function(x) {
  p <-diversity(x[,"Freq"], index = "shannon")/log(length(x[,"Freq"]))
  p
})

Outputs

unlist(shannon.values)

HT2.1 HT2.2 3.877613 6.538055

unlist(invsimpson.values)

HT2.1 HT2.2 29.08357 124.61371

unlist(pielou.values)

HT2.1 HT2.2 0.8554942 0.8402891

Sa753 commented 1 year ago

Hi Nick,

Thanks for explaining. I am struggling to understand the meaning of the output more than the formula. Regarding the formula or calculation, the clonal expansion index is usually 1-entropy and entropy is calculated by Shannon diversity index. This was explained in STARTRAC paper for example and others. Does Inv Pielou or Pielou gives this meaning, please?.

If we forgot about Pielou all together, all what I need is an expansion index.

Thanks

ncborcherding commented 1 year ago

There is no real consensus for an expansion index for TCR/BCR (or really an ideal diversity index).

STARTRAC uses the Inverse Pielou as their expansion index (it is a little confusing as in the methods section this is referred to as normalized shannon entropy - which is not the ideal description).

If you just want an expansion index, Inverse Pielou as written in the package is probably fine. I would think about the implications of boot strapping as discussed above though.

Nick