Closed kikegoni closed 4 years ago
Hi Kike, Yes, under default mode, reads pseudoaligned to multiple genes are not counted. With -m option, a count will be split equally across pseudoaligned genes.
Hey @gaofan83 ,
Thanks for your answer. My concert is that, if we have a read that pseudoaligns to two different genes but with very different probabilities, would still be discarded? Let's say, after EM optimization, read1 pseudo aligns to geneA with 99% probability and to geneB with 1%. In the default mode, would that read be counted?
Thanks in advance,
Best,
Kike
Hi Kike,
There is no “align with probability” for kallisto bus/ bustools. With regular rna seq the probabilities are found using the em algorithm and individual reads are not discarded. For single cell there is no em step.
Hey @gaofan83 ,
Thanks a lot for your explanations. Then Kallisto bus is only pseudoaligning and generating the output bus file and if a read maps to an equivalence class containing transcripts from two different genes in the "bustools count" step it would discard this read, right?
Would it correct it somehow if this reads was assigned to an equivalence class mapping 9 transcripts from geneA and 1 transcript from geneB? In that case, with the -m option, it would assigned 0.9 counts to geneA and 0.1 counts to geneB, right?
Also, according to https://github.com/BUStools/bustools/releases it would be possible to perform EM with reads pseudoaligning to multiple genes with the option -em
The bustools count command adds the --em option that estimates gene abundances using an EM algorithm for reads that pseudoalign to multiple genes.
Note that the --multimapping option splits the read counts evenly across all genes, whereas the EM algorithm gives a more statistically valid answer. The two options are mutually exclusive.
Finally, just to clarify, if a UMI molecule pseudoaligns to an equivalence class containing multiple transcripts from the same gene, Kallisto | bustools would still count it just once, right?
Thanks a lot for all your help. I did not found any detailed explanation of what Kallisto and bustools are exactly doing.
Best,
Kike
Had a discussion with bustools developer. With -m option the read is split between the two genes. Number of transcripts is irrelevant. Hope that helps.
Hi! Thanks for developing this great tool. I would like to follow up on @kikegoni's last post above, particularly the part about the --em
option in bustools:
The bustools count command adds the --em option that estimates gene abundances using an EM algorithm for reads that pseudoalign to multiple genes.
Note that the --multimapping option splits the read counts evenly across all genes, whereas the EM algorithm gives a more statistically valid answer. The two options are mutually exclusive.
Just wanted to confirm that this single-cell option does use the E-M step to estimate abundances (rather than splitting multi-mapping reads equally)? And, by that logic, it would typically always be preferred over the --multimapping
option.
Thanks in advance!
Hey,
First of all thanks for implementing the Kallisto | bustools for single cell RNA-seq analysis. I find it super helpful.
I was wondering how Kallisto | bustools deals with multi-mapping reads. From the GitHub manual it is explained that in the command
there is an option to include reads mapping to multiple genes:
So from this we can assume that the default "bustools count" doesn't include the counts from multimapping reads, right? And that if I included the "-multimapping" option the multi-mapping reads would be split the counts across the genes it maps depending on the expression of each gene, right?
Thanks in advance,
Best,
Kike