splicebox / PsiCLASS

Simultaneous multi-sample transcript assembler for RNA-seq data
16 stars 4 forks source link

Possible risks of including multiple datasets with different library features. #29

Open Pentayouth opened 1 year ago

Pentayouth commented 1 year ago

Dear developer,

I want to generate a meta-assembly for multiple cell lines (n~50) from multiple datasets (n=6). Those datasets have different library features, some are strand-specific, some are unstranded, some are poly a, some are total rna... sequencing depth ranging from 5M to 50M reads...

I wonder if there is any risk of performing psiclass meta-assembly on such a mixture?

Best, Wang

mourisl commented 1 year ago

PsiCLASS does not support libraries that are different too much. I think you can hack the wrapper to mix stranded and unstranded libraries. But I would be cautious about polya and total rna, as the reads spanning introns and inside the intron follow different distributions in these two libraries.

Pentayouth commented 1 year ago

Thanks a lot!