Librarysize - Githubissues

TdzBAS commented 1 year ago

Hi,

I have a question regarding the library size of scRNAseq simulation.

In total there are 2 simulations, one with a librarysize of 20 Mio and one with a librarysize of 5 Mio. Is librarysize referring to the number of reads for the whole Simulation? so 20 Mio reads for all 100 samples in total or 20 million reads per cell or 20 Mio reads per Group?

Arent this sequencingdepth to high for single cell rna sequencing? I think the number of reads should be lower for single cell data.

Cheers

mhjiang97 commented 1 year ago

Dear Tolga,

Thank you for your question, and I have also received your email. I would like to make my response public by replying to you on Github.

To answer your question, yes, the library size refers to the total number of reads. Before simulating the scRNA sequencing data, we searched for papers that aimed to identify specific (differential) splicing events based on scRNA-seq analysis. We found that to capture the whole splicing scenario better, scientists tended to generate scRNA-seq data with high sequencing depth, typically greater than 10 million per sample. As a result, we simulated the two datasets you mentioned with such high library sizes. Although the library sizes were high, we aimed to benchmark differential splicing tools designed for scRNA-seq without considering the effect of library sizes. Therefore, we provided these tools with fully saturated scRNA-seq data while keeping the dropout rate relatively low. However, if you wish to explore the impact of library sizes, you can modify the code to generate simulated data with lower library sizes, which we did not cover in our published study.

I hope that my answer meets your satisfaction.

Best regards, Minghao

TdzBAS commented 1 year ago

Hi Minghao,

thanks for you answer! I understand your argumentation. So just to be clear, we have 20 Million reads per cell? Thanks for your concise answer!

Best, Tolga

mhjiang97 commented 1 year ago

Yes!

TdzBAS commented 1 year ago

Hi @mhjiang97,

do you have a source or the paper where it is mentioned that capturing the whole splicing scenario is best done with more than 10 million reads?

and why did you choose the threshold 500 here?

Best Tolga

mhjiang97 / Benchmarking_DS

Librarysize #2