pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
141 stars 24 forks source link

Exon Quantification #201

Closed shashkat closed 1 year ago

shashkat commented 1 year ago

Hi there, firstly thanks for making such an awesome tool. I have had great results with the transcript quantification of single cell data. However, now I want to move into exon level quantification as that is the next step in my analysis. Could you please guide how I can try to do that. As far as I know, kallisto currently doesn't have that feature, but getting insights into how to approach it would be of great help. Any help is appreciated!

Thank You!

Yenaled commented 1 year ago

Hmm, for transcripts, we go into the GTF, extract the exons and stitch them together. I guess for exons, the approach would be to simply index the exons individually. Everything else should proceed as normal.

shashkat commented 1 year ago

Okay, thanks a lot for your quick response! I will give it a try and see if it works.

Regards,

shashkat commented 1 year ago

Hi there, so i have been able to implement it successfully by making some modifications in the gtf and obtaining a new index file. However, I was wondering about a basic issue regarding the mapping of reads in kallisto. Could you please be kind enough to explain it?

In the following case, how does the tool approach to map the read to the appropriate exon?

Screenshot 2023-04-11 at 11 07 46 PM
Yenaled commented 1 year ago

That read will multimap -- both exons will be mapped.

What you do with the multimapping depends on how you run "bustools count". You can choose to either discard the multimapping (default), assign 0.5 to both exons, do probablistic assignment (e.g. an EM algorithm), etc.

shashkat commented 1 year ago

Hi there! thank you again for your response. It clarifies my doubt.

I also wanted to get a small clarification on the exon quantification done by me. So, as you said, I just had to modify the gtf for that. For ease in the code, I removed all the features from the GTF which were not gene, transcript, and exon (like start codon, stop codon), and finally changed the remaining file in such a way that every exon now belonged to a unique transcript (I had to introduce artificial rows to do this)..

My question is that since I just want to do exon quantification, removing the features other than gene, transcript, exon wont have any effect on my quantification right?

Thanks a lot for your help and patience!!

Yenaled commented 1 year ago

That should be fine but it's hard to say what exactly you're doing with the GTF. The best thing to do would be to run kb ref and inspect the FASTA file produced by it (the output file of -f1) and see if it's giving you the target sequences (exons in this case) that you want to quantify.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days