pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
141 stars 24 forks source link

issue with using demultiplexed files in SMARTSEQ2 mode #172

Closed franziskadenk closed 1 year ago

franziskadenk commented 2 years ago

Describe the issue Dear kb-bustools team, First of all - thanks so much for providing this amazing tool to the community! It's really fast and super easy to use, even for amateurs like me. I do have one question though that I was hoping you could help me with. I used to run your old -x SMARTSEQ options without issues, but now that you upgraded to -x SMARTSEQ2, I seem to be having issues getting my batch file recognised (which I need as my fastqs are demultiplexed). Code & Error below.

Maybe it is something with how I set up the batch file? It reads: A1 cR0883-S0001_A68702_ABPlate1A1_H3VY5DRX2_TAAGGCGA-ATAGAGAG_L002_R1.fastq.gzcell1_1.fastq.gz R0883-S0001_A68702_ABPlate1A1_H3VY5DRX2_TAAGGCGA-ATAGAGAG_L002_R2.fastq

Thanks!!

!kb count --verbose --overwrite --h5ad -i '/content/drive/MyDrive/A_data/fastq/index.idx' -g '/content/drive/MyDrive/A_data/fastq/t2g.txt' -x SMARTSEQ2 --parity paired -b batch.txt -o '/content/drive/MyDrive/AB2' *.fastq.gz

kb: error: unrecognized arguments: -b R0883-S0001_A68702_ABPlate1A1_H3VY5DRX2_TAAGGCGA-ATAGAGAG_L002_R1.fastq.gz R0883-S0001_A68702_ABPlate1A1_H3VY5DRX2_TAAGGCGA-ATAGAGAG_L002_R2.fastq.gz

Yenaled commented 2 years ago

Batch file example shown in:

https://github.com/pachterlab/kb_python/issues/167

franziskadenk commented 2 years ago

Wonderful!! Thank you so much Yenaled! Also for the quick reply!

I was following some older instructions here https://pachterlab.github.io/kallisto/manual when batch still seemed to be an optional term after -b...

Putting the batch file at the end worked wonderfully, as in the below.

!kb count --verbose --overwrite --h5ad -i '/content/drive/MyDrive/A_data/fastq/index.idx' -g '/content/drive/MyDrive/A_data/fastq/t2g.txt' -x SMARTSEQ2 --parity paired  -o '/content/drive/MyDrive/AB2' batch.txt

Next, I will also add the --tcc option, as you recommended in the other thread.

One last dumb question: in the output (see below) it says "no technology specified". I thought I had specified it with -x? Should I use a different term to specify it? Or does kb count run the same whether it is correctly specified or not?

[2022-07-08 06:40:43,267]   DEBUG [count] [bus] no technology specified; will try running read files supplied in batch file
[2022-07-08 06:40:43,267]   DEBUG [count] [bus] --paired ignored; single/paired-end is inferred from number of files supplied

Thanks again!!

Yenaled commented 2 years ago

Don't worry about the message -- it's generated by kallisto whenever kallisto is run in "batch" mode rather than your standard single cell sequencing modes (e.g. 10X, CELSEQ, etc.). Smartseq2 is not a technology in kallisto itself -- so supplying smartseq2 to kb basically tells kallisto to go into "batch" mode (which is how kallisto handles demultiplexed read files). In the next release of kallisto, I'll consider removing that message to avoid confusion.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days