Open mortunco opened 9 months ago
Hello! I just looked at the dataset -- for some reason, there seems to be strand-bias [causing mapping rate < 30%] under the default 10x settings (forward-stranded) so set --strand=unstranded.
As for your actual question: Why do mature+ambiguous gives fewer UMIs than just standardly mapping to the transcriptome (while discarding out-of-transcriptome reads)? This is because of UMI deduplication. In standard index, the same UMI sequences may simply map to a single gene, but in nac index, some of the same UMI sequences may map to multiple places (because there are more things in the index) -- and hence get filtered out. tl;dr it's because of UMI counting.
And yes, you want to sum ambiguous+mature.
Dear Yanaled, I am gonna give --strand=unstranded
a try. Thank you very much for the hasty reply. I really appreciate it.
Hello Tunc:
I am using a similar velocity workflow and I was wondering if you could please answer my question here: https://github.com/pachterlab/kb_python/issues/228#issuecomment-1852895595
@mortunco I just released a new version of kb-python (version 0.28.1) -- it should cause less UMI differences between the standard index and the nac index.
Previously, if there was an exon/intron overlap of two different genes, the intron was being prioritized (which caused issues since a UMI could map to many introns). Now, the exon is being prioritized and, from what I tested, there is very high agreement between the standard index and the nac index.
Hello,
Thank you for keeping kallisto alive. These days, more and more I realise how its actually important and hard maintain and update tools across many years.
Background:
I am processing same 10x 5' v2 chemistry sample fastqs with kallisto. I found great number of total UMI difference between velocity and nonvelocity runs. I shared the commands i used to run the scripts
Velocity mode
velocity ref
Nonvelocity mode
novelocity ref
Attempts
I calculated total UMI counts using the following lines. For this question please ignore the other kallisto runs. Although I used them trying to figure out whats going on, its not related to this question. Here you can see the total UMI difference between velocity and nonvelocity_prebuilt runs. One important note here is that I added ambigous + spliced and unspliced to velocity.
In other comparison, you may see that even we add spliced+ambigous it still does not add up to velocity.
Questions
1) It is clearly the impact of index + workflow. But in terms of good practice could you please show me guidance. I would like to run kallisto only once and use spliced+unspliced counts for standard sc analysis calculation and their ratio for the velocity/trajectory analysis.
Thank you very much for your time and patience,
Best regards,
Tunc.