pachterlab / kallisto

Near-optimal RNA-Seq quantification
https://pachterlab.github.io/kallisto
BSD 2-Clause "Simplified" License
657 stars 172 forks source link

UMI mode - Internal Demultiplexing? #111

Open vals opened 8 years ago

vals commented 8 years ago

Hi,

I am very excited that there finally is a tool out that does proper UMI quantification! I will retire my heuristic method and recommend this in it's place.

There are a couple of use cases though which might be problematic from a practical stand point.

Firstly, this requires the cells to be demultiplexed from each other. In many protocols, e.g. CEL-seq or MARS-seq, the cellular barcode is not demultiplexed by the Illumina pre-processing pipeline. Do you have any recommendations for demultiplexing?

Additionally, in the more recent nanoliter droplet based protocols, there are hundreds of thousands of potential cellular barcodes. Usually an experiment only captures a few thousand cells, but demultiplexing in to files before seeing which ones actually contained cells does not work well with most file systems. In my heuristic script, I handled this by doing cellular barcode demultiplexing internally and returning a table. Is there any way Kallisto could do the demultiplexing internally, similar to how UMI's are handled?

lakigigar commented 8 years ago

Hi Valentine,

Thanks! We are thinking about demultiplexing but have not tackled that step for now. We do have a workflow for 10x that starts after the demultiplexing that we've just posted and that you might find useful for some of the technologies you mentioned as well (we plan mods for some of them in the near future). See

https://pachterlab.github.io/kallisto/singlecell.html

You are right that many of the steps (including demultiplexing) might be best handled internally in kallisto and we're looking at that. Lior

On Fri, Jun 3, 2016 at 4:05 AM, Valentine Svensson <notifications@github.com

wrote:

Hi,

I am very excited that there finally is a tool out that does proper UMI quantification! I will retire my heuristic method https://github.com/vals/umis and recommend this in it's place.

There are a couple of use cases though which might be problematic from a practical stand point.

Firstly, this requires the cells to be demultiplexed from each other. In many protocols, e.g. CEL-seq or MARS-seq, the cellular barcode is not demultiplexed by the Illumina pre-processing pipeline. Do you have any recommendations for demultiplexing?

Additionally, in the more recent nanoliter droplet based protocols, there are hundreds of thousands of potential cellular barcodes. Usually an experiment only captures a few thousand cells, but demultiplexing in to files before seeing which ones actually contained cells does not work well with most file systems. In my heuristic script, I handled this by doing cellular barcode demultiplexing internally and returning a table. Is there any way Kallisto could do the demultiplexing internally, similar to how UMI's are handled?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/pachterlab/kallisto/issues/111, or mute the thread https://github.com/notifications/unsubscribe/AC042AECowEdtCdOesOZLEwADH6qf7Phks5qIAqKgaJpZM4ItbWh .

jbergenstrahle commented 7 years ago

Hi,

Just wanted to drop in and ask if there has there been any further development regarding internal demultiplexing with Kallisto? I would be very interested in such an implementation!

hmassalha commented 5 years ago

Dear Prof. @lakigigar, I am wondering if you have any internal solution for demultiplexing mainly for the kallisto bus tool? Thanks a lot. HM