pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
141 stars 24 forks source link

output.s.c.bus has no BUS records #189

Closed royfrancis closed 1 year ago

royfrancis commented 1 year ago

One of my samples is failing. Giving it 1 node (20 cores and 128GB RAM).

kb count \
  -i ref/index.idx \
  -g ref/t2g.txt \
  -t 20 \
  -m 100G \
  -o 05dpf/ \
  -x SMARTSEQ3 \
  --gene-names \
  --h5ad \
  --overwrite \
  --kallisto kallisto \
  --bustools bustools \
  -w 5A_v1.5.txt \
  /K_9003_S3_I2_001.fastq.gz \
  /K_9003_S3_I2_001.fastq.gz \
  /K_9003_S3_R1_001.fastq.gz \
  /K_9003_S3_R2_001.fastq.gz
Full output ``` [2023-01-24 14:18:15,086] INFO [count_smartseq3] Using index ref/index.idx to generate BUS file to 05dpf/ from [2023-01-24 14:18:15,086] INFO [count_smartseq3] K_9003_S3_I2_001.fastq.gz [2023-01-24 14:18:15,087] INFO [count_smartseq3] K_9003_S3_I2_001.fastq.gz [2023-01-24 14:18:15,087] INFO [count_smartseq3] K_9003_S3_R1_001.fastq.gz [2023-01-24 14:18:15,087] INFO [count_smartseq3] K_9003_S3_R2_001.fastq.gz [2023-01-24 14:38:48,742] INFO [count_smartseq3] Sorting BUS file 05dpf/output.bus to 05dpf/tmp/output.s.bus [2023-01-24 14:40:33,619] INFO [count_smartseq3] Inspecting BUS file 05dpf/tmp/output.s.bus [2023-01-24 14:40:34,941] INFO [count_smartseq3] Correcting BUS records in 05dpf/tmp/output.s.bus to 05dpf/tmp/output.s.c.bus with whitelist 5A_v1.5.txt [2023-01-24 14:40:38,875] ERROR [main] An exception occurred Traceback (most recent call last): File "/crex/proj/nobackup/nbis/data/processed/kb/conda/kb/lib/python3.8/site-packages/kb_python/main.py", line 1305, in main COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir) File "/crex/proj/nobackup/nbis/data/processed/kb/conda/kb/lib/python3.8/site-packages/kb_python/main.py", line 530, in parse_count count_smartseq3( File "/crex/proj/nobackup/nbis/data/processed/kb/conda/kb/lib/python3.8/site-packages/ngs_tools/logging.py", line 62, in inner return func(*args, **kwargs) File "/crex/proj/nobackup/nbis/data/processed/kb/conda/kb/lib/python3.8/site-packages/kb_python/count.py", line 1370, in count_smartseq3 prev_result = bustools_correct( File "/crex/proj/nobackup/nbis/data/processed/kb/conda/kb/lib/python3.8/site-packages/kb_python/validate.py", line 121, in inner validate(path) File "/crex/proj/nobackup/nbis/data/processed/kb/conda/kb/lib/python3.8/site-packages/kb_python/validate.py", line 88, in validate VALIDATORS[ext](path) File "/crex/proj/nobackup/nbis/data/processed/kb/conda/kb/lib/python3.8/site-packages/kb_python/validate.py", line 40, in validate_bus raise ValidateError(f'{path} has no BUS records') kb_python.validate.ValidateError: 05dpf/tmp/output.s.c.bus has no BUS records ```
kb_python 0.27.3
kallisto, version 0.48.0
bustools, version 0.42.0
Yenaled commented 1 year ago

Hmm, looks to me like potentially a whitelist issue? What does the first few lines of 5A_v1.5.txt look like? Are those barcodes present in your I1+I2 files?

Also, what's the file size of 05dpf/output.bus? If it's fairly large, then that means the mapping step was successful and that it's one of the downstream steps (most likely correcting to a whitelist) that's failing.

royfrancis commented 1 year ago

The barcode file looks normal. And it should be the file for this sample according to the facility that sent me the file.

> head 5A_v1.5.txt
GAGCGCCTATCCAACGAATA
TAAGACGGTGCCAACGAATA
CCTCAACTGGCCAACGAATA
TTGTCCTGTACCAACGAATA
AGCTCTGGTTCCAACGAATA
TTCGTTGTACCCAACGAATA
CTATAACCGTCCAACGAATA
TTGTCTTGACCCAACGAATA
TAGGAGTGTCCCAACGAATA
TCAGCAATTGCCAACGAATA

05dpf/output.bus is 5.1GB.

royfrancis commented 1 year ago

Ok. looks like a typo. I see now that I used I2 file name twice.