Testing rail-RNA on a project uploaded in Recount2 - different results

gianmaz commented 5 years ago

To whom, it may concern. I would like to run the same pipeline used to generate Recount2 in my dataset. To check if the rail-RNA pipe worked correctly I ran it on a small project uploaded in Recount2 (SRP047399)

Rail-RNA was run using this command: rail-rna go local -x dir_bowtie_hg38 dir_bowtie2_hg38 -m manifest_file --log dir_log_file -d tsv,bw,jx -f --scratch temp_folder_dir

based on Bowtie and Bowtie2 indexes for the reference genome hg38. Downloaded from:

I genome Illumina:

ftp://ussd-ftp-illumina.com/Homo_sapiens/UCSC/hg38/Homo_sapiens/sequences/BowtieIndex
ftp://ussd-ftp-illumina.com/Homo_sapiens/UCSC/hg38/Homo_sapiens/sequences/Bowtie2Index

I used the following data to run the R scripts after the rail-RNA pipeline.

However when I match the gene counts or exon counts that I obtained against those uploaded in Recount2 I found some differences. In some cases, these differences resulted in more than 100.000 read counts

Moreover, I get this error when I run the prep_merge.R script oo <- findOverlaps(jx_gr, introns_unique, type = 'equal') stopifnot(length(unique(queryHits(oo))) == length(oo))

Can this be due to the use of wrong annotations?

I read that the tool is deterministic, I expected to obtain the same results. I check the number of reads in input and they are the same. However, the number of mapped reads is different

Do you have any idea why this happens? Considering that I would like to compare my samples with the data in Recount. Please, can you confirm that the pipeline and the reference and the annotation files are correct?

Thank you. Best Regards, Gianluca

Please find below an example for one of the sample.

SRR1583592_my results		SRR1583592_recount2
4398		1345
108156		108153
15685		15685
1		0
47805		115970
195		195

lcolladotor commented 3 years ago

Hi,

Wow, I didn't see this in... nearly 2 years. Sorry!

@nellore might be able to respond this question more precisely, but my understanding is that Rail-RNA will produce the same results if it's run with the same set of samples. We didn't run samples individually, as Rail-RNA scales up if you run samples in small batches. This could explain small differences if you re-run Rail-RNA with a small subset of samples.

However, large differences as the ones you noticed, are likely due to how many reads were processed by Rail-RNA. For example, maybe when we ran Rail-RNA we had an issue downloading the data from SRA and only a portion of it got downloaded. That's part of the information we provide in the colData(rse_gene) information (or in the phenotype text table file).

For the prep_merge.R error, a reprex would be useful to help you more.

Finally, note that recount3 is coming out soon that was developed by @ChristopherWilks.

Best, Leo

lcolladotor commented 3 years ago

I'm closing this since recount3 is available on Bioconductor now and the pre-print is around the corner. Plus this issue has been inactive for a while. Though if you still have questions, please feel free to re-open it. Thanks!

leekgroup / recount-contributions

Testing rail-RNA on a project uploaded in Recount2 - different results #2