thelovelab / tximeta

Transcript quantification import with automatic metadata detection
https://thelovelab.github.io/tximeta/
64 stars 11 forks source link

Tximeta, Salmon, and imported TPM #41

Closed FrAoJm closed 3 years ago

FrAoJm commented 3 years ago

Hi! I am using tximeta to import the abundances from salmon quant files (usually using genecode human transcriptome) and I realise the summarised experiment object has very weird (for my understanding) of the counts. I checked the TPM column on the quant files and they (as expected by TPM nature...) sum 10^6. But:

> sum(assay(se)[,1])
[1] 14832791
> sum(assay(se)[,2])
[1] 12745303
> sum(assay(se)[,3])
[1] 11257534
> sum(assay(se)[,4])
[1] 15754908
> sum(assay(se)[,5])
[1] 24723069

What are these numbers? am I doing something wrong? shouldn't be them also a million of TPM?

Thanks,

mikelove commented 3 years ago

You aren’t specifying the assay name, so it’s providing you the estimated counts (NumReads).

These have priority in tximport / tximeta because these are used in statistical modeling with abundance and length used as an offset.

mikelove commented 3 years ago

See this section:

https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html#summarizedexperiment_output

FrAoJm commented 3 years ago

Thank you, Mike, for the explanation, I am quite new in bioinformatics.... Very helpful. I have to still to understand (digest...) the meaning of the offset... but I will read more about it.

After this, I normalise following the next steps (is this right)?;

# Summarise to Gene-level
gse <- summarizeToGene(se)

And normalise,...

library(edgeR)
y <- makeDGEList(gse)
keep <- filterByExpr(y)
y <- y[keep, ,keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
norm.counts.TMM<- as.data.frame(cpm(y)) #not sure if it is better with log=T, or log=F)

I used to be more familiar with DESeq2, but I have no groups in my dataset I couldn't found how to normalise without adding groups. (if there is a way happy to follow that lead... :) )

Thank you so much for your help!! and the quick response! I have another doubt regarding the use of TPM across samples but probably for another issue XD

Kind regards,

mikelove commented 3 years ago

Yes, correct. For support related questions I find it easier to use the Bioc support site: support.bioconductor.org

Most of the GH issues here are feature requests or bug reports.

FrAoJm commented 3 years ago

Thank you, Mike. I will use the Bioc Support site next time, but I really appreciate the quick answer!

Kind regards,