Closed lzx325 closed 2 years ago
For UMI-labeled data, we do not think it's possible because only the 5' end or 3' end is sequenced. You can get transcript compatibility counts by using the --tcc option though.
For smartseq3 and for smartseq2 data, running kb count with the --tcc option will actually get you transcript-level expression.
Dear Yenaled, Thank you for the clarification. For 10x, even though only the 3' end is sequenced, is it still possible to perform differential usage analysis of some 3' alternative splicing events using the transcript compatibility counts? Could you please point me to some resource or publications regarding this? Thanks!
Yes, it's possible and it's exactly what was done here:
Dear Yenaled, As the transcript compatibility counts are difficult to integrate with other downstream analysis tookits, do you think it is still OK to use some simple post-processing (e.g., uniformly distribute UMI count to each isoform in the equivalence class) to convert transcript compatibility counts to per-isoform UMI counts?
No, because of the nature of the reads, there's an identifiability problem which is why we don't support such practices.
That said, you can still try running the EM algorithm to distribute counts among isoforms (akin to what is done when running "kallisto quant" on bulk RNAseq samples). I just don't recommend it for the reason above and we do not currently endorse such practices. (The EM algorithm can be run by using "kallisto quant-tcc" on the cells_x_genes.mtx file)
Dear Yenaled,
I am using kallisto 0.46.1, but I cannot kallisto quant
having a -tcc
option to only run the EM algorithm. It all starts from fastq files. What am I missing here?
You need to upgrade to the latest version (0.48.0).
It is working. Thank you very much!
Hi developers, The current default usage of
kb count
is to generate a gene-level UMI count matrix. I wonder if it is possible to get the UMI count for each specific isoform using kb? Thank you if you can point me to some tutorial or examples!