Feature request: Standalone pseudoalignment -> BAM output

pachterlab / kallisto

Near-optimal RNA-Seq quantification

https://pachterlab.github.io/kallisto

BSD 2-Clause "Simplified" License

656 stars 172 forks source link

Feature request: Standalone pseudoalignment -> BAM output #433

Closed bbimber closed 7 months ago

bbimber commented 7 months ago

Hello,

Thanks for your help on other threads. Version 0.48.0 supported an argument to quant to export a pseudoalignment BAM file, which is extremely useful. This has apparently been dropped. I'm writing to see if you'd consider re-supporting this in the future.

Our use case is to take the raw alignments, and therefore if there was an entrypoint separate from quant that would be more direct. We are planning to use this with scRNA-seq data, so if there was some option to calculate and retain the CB and UMI tags that would be extremely useful. We're exploring whether we can run umi-tools upstream and store the parsed CB/UMI in the read names as well.

Yenaled commented 7 months ago

The initial plan was to maintain pseudobam or implement an alterative/better variant of it.

However, I will not be resupporting it. I simply don't have the bandwidth to do so; and it's not trivial with the new data structure I've integrated into kallisto. Besides, my view is if you want to visualize alignments, use a genome aligner, not a pseudoaligner. kallisto is a quantification tool (and I oftentimes use it in conjunction with a separate genome aligner to get the best of both worlds).

A workflow that's already functional (though not really documented besides my online Q&A) is to take specific barcodes/UMIs/transcripts/whatever of interest that have pseudoaligned, extract those reads, and run those reads through a separate genome aligner.

bbimber commented 7 months ago

OK, thanks for the reply. Our use case is related to applying different and flexible criteria for acceptance of alignments for inclusion into quantification. Being BAM format is useful but not the key things here. What I'm trying to access is read/alignment level data. I can see that kallisto is seems to be evolving into a quantification tool, rather than standalone pseudoaligner.

I've toyed with what you describe above with the two-step pseudoalignment -> passing reads -> positional aligner. Those tended to be clunky but we might give it another look.

I appreciate your point about time. If there was a simple way to expose rawer pseudoalignment output, perhaps with less work for your than implementing BAM, we'd appreciate it. Even a tool that exported some sort of tab-delimited stream of passing pseudoalignments would be useful.