pachterlab / kallisto

Near-optimal RNA-Seq quantification
https://pachterlab.github.io/kallisto
BSD 2-Clause "Simplified" License
658 stars 172 forks source link

Will 10x 5' paired-end pseudoalignment be supported? #287

Open marencc opened 4 years ago

marencc commented 4 years ago

Hi!

Many thanks for all you development efforts!

I have six data sets that I would be interested in benchmarking with Kallisto-bustools, however I just found out that kalllisto bus only supports 3' chemistry. Will you be supporting 5'PE anytime soon (SC5P-PE )? Is there a turnaround for this situation?

List of supported single-cell technologies

short name description


10xv1 10x version 1 chemistry 10xv2 10x version 2 chemistry 10xv3 10x version 3 chemistry CELSeq CEL-Seq CELSeq2 CEL-Seq version 2 DropSeq DropSeq inDrops inDrops SCRBSeq SCRB-Seq SureCell SureCell for ddSEQ

kikegoni commented 3 years ago

Hey,

First of all thanks for developing Kallisto-Bustools for scRNA-seq, it is super useful!!

I just wanted to follow @marencc and ask if there is Kallisto bustools is suited for 10x 5' chemistry?

Best

Kike

Yenaled commented 3 years ago

Yes, it is suitable for 5' chemistry and should be no different than 10X v2 chemistry (i.e. your first fastq file contains your 16bp barcode and 10bp UMI and your second fastq file contains your biological read that you wish to map to the transcriptome).

Happy to answer any questions if you run into any problems with the workflow.

kikegoni commented 3 years ago

Hey,

Thanks a lot for your help and your fast answer! My question is more about if Kallisto bus is able to work with 10x 5' in paired-end mode. So for example, a lot of 10x 5' GEX is in paired-end form, with 150 bp in R1 (16bp CB + 10bp UMI + cDNA) and 150bp in R2 (all cDNA). This would be the format 'SC5P-PE' in 10x (See:https://kb.10xgenomics.com/hc/en-us/articles/115003764132-How-does-cellranger-count-auto-detect-chemistry-)

It that helps, I have seen that STARsolo recently supports this too: https://github.com/alexdobin/STAR/blob/master/docs/STARsolo.md#barcode-and-cdna-on-the-same-mate https://github.com/alexdobin/STAR/issues/768

Thanks for any help you might provide about this!!

Best,

Kike

Yenaled commented 3 years ago

Thanks for the follow up clarification. I see you've tried a few things here: https://github.com/pachterlab/kallisto/issues/226#issuecomment-931297217 (issue #226 )

You can indeed specify multiple sequences in the technology string, however, you must use a comma rather than a colon; e.g. 0,0,16:0,16,26:0,26,0,1,0,0

Let me know if this works for you and if you have further questions

kikegoni commented 3 years ago

Hey,

Thanks a lot for your fast reply!! I can confirm it works now, thanks a lot!!!

Best,

Kike

Gpasquini commented 2 years ago

Sorry to come back on this, but I am also trying to count SC5P-PE data with kallisto|bustools.

The read scheme is 150 x 150: R1: 16xBC,10xUMI,124xcDNA ------------------- R2: 150xcDNA

Given that it seems to work for me as well, Could you please explain the rationale behind: 0,0,16:0,16,26:0,26,0,1,0,0 or direct me to a manual section where this nomenclature is explained?

Many thanks, Giovanni

kikegoni commented 2 years ago

I think that this is done in order to identify from where to where goes your Cell Barcode, UMI and cDNA sequence. So the first part of your string (0,0,16) indicates that the Cell Barcode is in the first fast file (R1) and starts at position 1 (0 often in computing) and goes until position 16. And so on....you can see the information here: https://pachterlab.github.io/kallisto/manual