How to output the scaled counts for each sample

biozzq commented 1 year ago

Dear @mikelove

Thank you for your package. This is my first time to run Swish ( following the vignette, https://bioconductor.org/packages/devel/bioc/vignettes/fishpond/inst/doc/swish.html) using inferential replicates computed by salmon.

First, when running salmon, I set --validateMappings --seqBias --gcBias --numBootstraps 100 to generate bootstrap samples. Here, you typically recommend 20-30 inferential replicates, but I set --numBootstraps 100, would this have an effect on the quantification results?

Second, after running following commands, how to save the scaled counts to a file? The se object contains following metadata.

class: SummarizedExperiment 
dim: 14796 14 
metadata(4): tximetaInfo countsFromAbundance infRepsScaled preprocessed
assays(103): counts abundance ... infRep99 infRep100
rownames(14796): gene-A1CF gene-A2M ... gene-ZZEF1 gene-ZZZ3
rowData names(7): log10mean keep ... locfdr qvalue
colnames(14): C1 C2 ... DXM6 DXM7
colData names(2): names condition

library(tximeta)
library(fishpond)
library(GenomicRanges)
library(SummarizedExperiment)
data <- read.table("sample.inf",head=T)
dir <- getwd()
data$files<-file.path(dir,data$names,"quant.sf")
all(file.exists(data$files))
data <- data[data$condition %in% c("C","DXM"),]
data$condition <- factor(data$condition, levels=c("C","DXM"))
tx2gene <- read.table("tran_to_gene", head=F, col.names=c("TXNAME","GENEID"))
se <- tximeta(data, skipMeta=TRUE, txOut=FALSE, tx2gene=tx2gene)
se<-scaleInfReps(se)
se <- labelKeep(se)
se <- se[mcols(se)$keep,]
set.seed(1)
se <- swish(se, x="condition")

Looking forward to your reply, thank you. Yours sincerely.

Zheng zhuqing

mikelove commented 1 year ago

Good questions.

100 bootstraps is fine, we found 20-30 is sufficient but more is fine. The objects will just be a bit larger.

A good way to save the normalized data is:

se <- computeInfRV(se)
posterior_mean <- assays(se)[["mean"]]
write.csv(posterior_mean, file="posterior_mean_counts_scaled.csv")

biozzq commented 1 year ago

Thank you. It is ok for me to output the scaled counts now after changing posterior_mean <- assay(se)[["mean"]] to posterior_mean <- assays(se)[["mean"]], in which I think you have a typo.

mikelove commented 1 year ago

Yes a typo.

thelovelab / fishpond

How to output the scaled counts for each sample #30