pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
146 stars 23 forks source link

Barcode length of zero for custom technology strings #258

Closed mckellardw closed 3 months ago

mckellardw commented 3 months ago

Describe the issue When passing a custom technology string to kb count, it is incorrectly inferring a barcode length of zero. The command below is for SlideSeq data, but I have also tested this with Visium data and had the same error. The Visium data runs without error if I pass "Visium" for -x.

Using kb_python version 0.28.2

What is the exact command that was run?

kb count \
-i /gpfs/commons/groups/innovation/dwm/ref_snake/out/mus_musculus/transcriptome/kb/transcriptome.idx \
-g /gpfs/commons/groups/innovation/dwm/ref_snake/out/mus_musculus/transcriptome/kb/t2g.txt \
-o ./ \
--strand forward \
--workflow standard \
-x '0,0,8,0,26,32:0,32,39:1,0,0' \
-w whitelist.txt \
--cellranger \
--overwrite \
-t 28 \
-mm \
--verbose \
trimmed_R1.fq.gz trimmed_R2.fq.gz 

Command output (with --verbose flag)

[2024-06-13 10:17:43,161]   DEBUG [main] Printing verbose output
[2024-06-13 10:17:45,368]   DEBUG [main] kallisto binary located at /gpfs/commons/groups/innovation/dwm/mambaforge/envs/kb/lib/python3.10/site-packages/kb_python/bins/linux/kallisto/kallisto
[2024-06-13 10:17:45,368]   DEBUG [main] bustools binary located at /gpfs/commons/groups/innovation/dwm/mambaforge/envs/kb/lib/python3.10/site-packages/kb_python/bins/linux/bustools/bustools
[2024-06-13 10:17:45,369]   DEBUG [main] Creating `./tmp` directory
[2024-06-13 10:17:45,371]   DEBUG [main] Namespace(list=False, command='count', tmp=None, keep_tmp=False, verbose=True, i='/gpfs/commons/groups/innovation/dwm/ref_snake/out/mus_musculus/transcriptome/kb/transcriptome.idx', g='/gpfs/commons/groups/innovation/dwm/ref_snake/out/mus_musculus/transcriptome/kb/t2g.txt', x='0,0,8,0,26,32:0,32,39:1,0,0', o='./', num=False, w=' whitelist.txt', r=None, t=28, m='m', strand='forward', inleaved=False, genomebam=False, aa=False, gtf=None, chromosomes=None, workflow='standard', em=False, mm=False, tcc=False, filter=None, filter_threshold=None, c1=None, c2=None, overwrite=True, dry_run=False, batch_barcodes=False, loom=False, h5ad=False, loom_names='barcode,target_name', sum='none', cellranger=True, gene_names=False, N=None, report=False, no_inspect=False, kallisto='/gpfs/commons/groups/innovation/dwm/mambaforge/envs/kb/lib/python3.10/site-packages/kb_python/bins/linux/kallisto/kallisto', bustools='/gpfs/commons/groups/innovation/dwm/mambaforge/envs/kb/lib/python3.10/site-packages/kb_python/bins/linux/bustools/bustools', no_validate=False, no_fragment=False, parity=None, fragment_l=None, fragment_s=None, bootstraps=None, matrix_to_files=False, matrix_to_directories=False, fastqs=[' trimmed_R1.fq.gz', ' trimmed_R2.fq.gz'])
[2024-06-13 10:17:48,129]    INFO [count] Using index /gpfs/commons/groups/innovation/dwm/ref_snake/out/mus_musculus/transcriptome/kb/transcriptome.idx to generate BUS file to ./ from
[2024-06-13 10:17:48,129]    INFO [count]          trimmed_R1.fq.gz
[2024-06-13 10:17:48,129]    INFO [count]          trimmed_R2.fq.gz
[2024-06-13 10:17:48,129]   DEBUG [count] kallisto bus -i /gpfs/commons/groups/innovation/dwm/ref_snake/out/mus_musculus/transcriptome/kb/transcriptome.idx -o ./ -x 0,0,8,0,26,32:0,32,39:1,0,0 -t 28 --fr-stranded  trimmed_R1.fq.gz  trimmed_R2.fq.gz
[2024-06-13 10:17:48,231]   DEBUG [count] 
[2024-06-13 10:17:50,637]   DEBUG [count] [index] k-mer length: 31
[2024-06-13 10:17:53,042]   DEBUG [count] [index] number of targets: 149,547
[2024-06-13 10:17:53,042]   DEBUG [count] [index] number of k-mers: 123,235,203
[2024-06-13 10:17:53,042]   DEBUG [count] [index] number of D-list k-mers: 3,073,156
[2024-06-13 10:17:53,042]   DEBUG [count] [quant] will process sample 1:  trimmed_R1.fq.gz
[2024-06-13 10:17:53,042]   DEBUG [count]  trimmed_R2.fq.gz
[2024-06-13 10:17:53,143]   DEBUG [count] [quant] finding pseudoalignments for the reads ... done
[2024-06-13 10:17:53,143]   DEBUG [count] [quant] processed 949 reads, 727 reads pseudoaligned
[2024-06-13 10:17:53,143]   DEBUG [count] 
[2024-06-13 10:17:54,746]    INFO [count] Sorting BUS file ./output.bus to ./tmp/output.s.bus
[2024-06-13 10:17:54,747]   DEBUG [count] bustools sort -o ./tmp/output.s.bus -T ./tmp -t 28 -m m ./output.bus
[2024-06-13 10:17:55,850]   DEBUG [count] Warning: low number supplied for maximum memory usage without M or G suffix
[2024-06-13 10:17:55,850]   DEBUG [count] interpreting this as 0Gb
[2024-06-13 10:17:55,851]   DEBUG [count] Read in 0 BUS records
[2024-06-13 10:17:55,851]   DEBUG [count] reading time 0s
[2024-06-13 10:17:55,851]   DEBUG [count] sorting time 0s
[2024-06-13 10:17:55,851]   DEBUG [count] writing time 0s
[2024-06-13 10:17:55,852]    INFO [count] Inspecting BUS file ./tmp/output.s.bus
[2024-06-13 10:17:55,852]   DEBUG [count] bustools inspect -o ./inspect.json -w  whitelist.txt ./tmp/output.s.bus
[2024-06-13 10:17:56,955]    INFO [count] Correcting BUS records in ./tmp/output.s.bus to ./tmp/output.s.c.bus with on-list  whitelist.txt
[2024-06-13 10:17:56,956]   DEBUG [count] bustools correct -o ./tmp/output.s.c.bus -w  whitelist.txt ./tmp/output.s.bus
[2024-06-13 10:17:58,159]   DEBUG [count] Found 77349 barcodes in the on-list
[2024-06-13 10:17:58,159]   DEBUG [count] Error: barcode length and on-list length differ, barcodes = 0, on-list = 14
[2024-06-13 10:17:58,159]   DEBUG [count] check that your on-list matches the technology used
[2024-06-13 10:17:58,160]   ERROR [count] Found 77349 barcodes in the on-list
Error: barcode length and on-list length differ, barcodes = 0, on-list = 14
check that your on-list matches the technology used
[2024-06-13 10:17:58,160]   ERROR [main] An exception occurred
Traceback (most recent call last):
  File "/gpfs/commons/groups/innovation/dwm/mambaforge/envs/kb/lib/python3.10/site-packages/kb_python/main.py", line 1618, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/gpfs/commons/groups/innovation/dwm/mambaforge/envs/kb/lib/python3.10/site-packages/kb_python/main.py", line 703, in parse_count
    count(
  File "/gpfs/commons/groups/innovation/dwm/mambaforge/envs/kb/lib/python3.10/site-packages/ngs_tools/logging.py", line 62, in inner
    return func(*args, **kwargs)
  File "/gpfs/commons/groups/innovation/dwm/mambaforge/envs/kb/lib/python3.10/site-packages/kb_python/count.py", line 1335, in count
    prev_result = bustools_correct(
  File "/gpfs/commons/groups/innovation/dwm/mambaforge/envs/kb/lib/python3.10/site-packages/kb_python/count.py", line 411, in bustools_correct
    run_executable(command)
  File "/gpfs/commons/groups/innovation/dwm/mambaforge/envs/kb/lib/python3.10/site-packages/kb_python/dry/__init__.py", line 25, in inner
    return func(*args, **kwargs)
  File "/gpfs/commons/groups/innovation/dwm/mambaforge/envs/kb/lib/python3.10/site-packages/kb_python/utils.py", line 203, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/gpfs/commons/groups/innovation/dwm/mambaforge/envs/kb/lib/python3.10/site-packages/kb_python/bins/linux/bustools/bustools correct -o ./tmp/output.s.c.bus -w  whitelist.txt ./tmp/output.s.bus' returned non-zero exit status 1.
[2024-06-13 10:17:58,165]   DEBUG [main] Removing `./tmp` directory
Yenaled commented 3 months ago

Your -mm should be --mm.

mckellardw commented 3 months ago

I'm going to pretend I didn't spend half of a day trying to figure out this bug... Thanks!