pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
144 stars 23 forks source link

Runtime error with kb count #266

Open kaushik-roy-physics opened 3 weeks ago

kaushik-roy-physics commented 3 weeks ago

Hello,

I keep getting this runtime error when using kb count to generate unspliced and spliced count arrays. Note that I used kb ref previously to generate the necessary files using the genome.fa and genes.gtf from Cellranger:

curl -O "https://cf.10xgenomics.com/supp/cell-exp/refdata-gex-GRCh38-2024-A.tar.gz"

All the files were correctly generated. But when I am running kb count, I keep getting this error. From what I see, the issue is with generating the bus file with kallisto bus. There should not be memory issues as I am using a 128 GB workstation for this purpose. I used the default thread count and memory for this.

I got the same error when I used kb count on pre-built indices downloaded from the kallisto repository. That's why I decided to run kb ref and then kb count hoping that might solve the issue. I checked my fastq files and they are not corrupted and have the correct inputs.

The exact script and complete message log is provided below. The /tmp directory did not have any files.

Looking for any solutions or suggestions to prevent this.

Regards, Kaushik


kb count --keep-tmp --h5ad --verbose -i ~/rdfmount/Kaushik/velocyto/kbref/index.idx -g ~/rdfmount/Kaushik/velocyto/kbref/t2g.txt -x 10xv3 -o ~/rdfmount/Kaushik/velocyto/kbcounts/1/ -c1 ~/rdfmount/Kaushik/velocyto/kbref/cdna.txt -c2 ~/rdfmount/Kaushik/velocyto/kbref/unspliced.txt --filter bustools --workflow lamanno ~/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R1_001.fastq.gz ~/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R2_001.fastq.gz [2024-08-20 14:34:45,693] DEBUG [main] Printing verbose output [2024-08-20 14:34:47,901] DEBUG [main] kallisto binary located at /home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/bins/linux/kallisto/kallisto [2024-08-20 14:34:47,902] DEBUG [main] bustools binary located at /home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/bins/linux/bustools/bustools [2024-08-20 14:34:47,907] DEBUG [main] Creating /home/kaushik/rdfmount/Kaushik/velocyto/kbcounts/1/tmp directory [2024-08-20 14:34:47,912] DEBUG [main] Namespace(list=False, command='count', tmp=None, keep_tmp=True, verbose=True, i='/home/kaushik/rdfmount/Kaushik/velocyto/kbref/index.idx', g='/home/kaushik/rdfmount/Kaushik/velocyto/kbref/t2g.txt', x='10xv3', o='/home/kaushik/rdfmount/Kaushik/velocyto/kbcounts/1/', num=False, w=None, r=None, t=8, m='4G', strand=None, inleaved=False, genomebam=False, aa=False, gtf=None, chromosomes=None, workflow='lamanno', em=False, mm=False, tcc=False, filter='bustools', filter_threshold=None, c1='/home/kaushik/rdfmount/Kaushik/velocyto/kbref/cdna.txt', c2='/home/kaushik/rdfmount/Kaushik/velocyto/kbref/unspliced.txt', overwrite=False, dry_run=False, batch_barcodes=False, loom=False, h5ad=True, loom_names='barcode,target_name', sum='none', cellranger=False, gene_names=False, N=None, report=False, no_inspect=False, kallisto='/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/bins/linux/kallisto/kallisto', bustools='/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/bins/linux/bustools/bustools', no_validate=False, no_fragment=False, parity=None, fragment_l=None, fragment_s=None, bootstraps=None, matrix_to_files=False, matrix_to_directories=False, fastqs=['/home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R1_001.fastq.gz', '/home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R2_001.fastq.gz']) [2024-08-20 14:34:50,707] INFO [count_lamanno] Using index /home/kaushik/rdfmount/Kaushik/velocyto/kbref/index.idx to generate BUS file to /home/kaushik/rdfmount/Kaushik/velocyto/kbcounts/1/ from [2024-08-20 14:34:50,708] INFO [count_lamanno] /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R1_001.fastq.gz [2024-08-20 14:34:50,708] INFO [count_lamanno] /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R2_001.fastq.gz [2024-08-20 14:34:50,708] DEBUG [count_lamanno] kallisto bus -i /home/kaushik/rdfmount/Kaushik/velocyto/kbref/index.idx -o /home/kaushik/rdfmount/Kaushik/velocyto/kbcounts/1/ -x 10xv3 -t 8 /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R1_001.fastq.gz /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R2_001.fastq.gz [2024-08-20 14:34:50,810] DEBUG [count_lamanno] [2024-08-20 14:34:50,810] DEBUG [count_lamanno] [bus] Note: Strand option was not specified; setting it to --fr-stranded for specified technology [2024-08-20 14:36:23,652] DEBUG [count_lamanno] [index] k-mer length: 31 [2024-08-20 14:40:34,005] DEBUG [count_lamanno] I failed to find one of the right cookies. Found 0 [2024-08-20 14:40:34,007] DEBUG [count_lamanno] terminate called after throwing an instance of 'std::runtime_error' [2024-08-20 14:40:34,007] DEBUG [count_lamanno] what(): failed alloc while reading [2024-08-20 14:40:35,809] ERROR [count_lamanno] [bus] Note: Strand option was not specified; setting it to --fr-stranded for specified technology [index] k-mer length: 31 I failed to find one of the right cookies. Found 0 terminate called after throwing an instance of 'std::runtime_error' what(): failed alloc while reading [2024-08-20 14:40:35,811] ERROR [main] An exception occurred Traceback (most recent call last): File "/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/main.py", line 1618, in main COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir) File "/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/main.py", line 665, in parse_count count_velocity( File "/home/kaushik/anaconda3/lib/python3.11/site-packages/ngs_tools/logging.py", line 62, in inner return func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/count.py", line 2405, in count_velocity bus_result = kallisto_bus( ^^^^^^^^^^^^^ File "/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/count.py", line 203, in kallisto_bus run_executable(command) File "/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/dry/init.py", line 25, in inner return func(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/utils.py", line 203, in run_executable raise sp.CalledProcessError(p.returncode, ' '.join(command)) subprocess.CalledProcessError: Command '/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/bins/linux/kallisto/kallisto bus -i /home/kaushik/rdfmount/Kaushik/velocyto/kbref/index.idx -o /home/kaushik/rdfmount/Kaushik/velocyto/kbcounts/1/ -x 10xv3 -t 8 /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R1_001.fastq.gz /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R2_001.fastq.gz' died with <Signals.SIGABRT: 6>

kaushik-roy-physics commented 3 weeks ago

When I used the prebuilt index downloaded from: https://github.com/pachterlab/kallisto-transcriptome-indices/releases, this is the output I receive. A solution for either issue would work for me. I am also attaching the files in the tmp directory. tmp.zip

kb count --h5ad --keep-tmp --verbose -i ~/rdfmount/Kaushik/velocyto/fastq_outs/index.idx -g ~/rdfmount/Kaushik/velocyto/fastq_outs/t2g.txt -x 10xv3 -o ~/rdfmount/Kaushik/velocyto/kbcount/1/ -c1 ~/rdfmount/Kaushik/velocyto/fastq_outs/cdna.txt -c2 ~/rdfmount/Kaushik/velocyto/fastq_outs/nascent.txt --workflow lamanno ~/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R1_001.fastq.gz ~/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R2_001.fastq.gz

The error was:

raise ValidateError(f'{path} has no BUS records') kb_python.validate.ValidateError: /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp/spliced.bus has no BUS records


[2024-08-20 15:25:02,599] DEBUG [main] Printing verbose output [2024-08-20 15:25:04,807] DEBUG [main] kallisto binary located at /home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/bins/linux/kallisto/kallisto [2024-08-20 15:25:04,808] DEBUG [main] bustools binary located at /home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/bins/linux/bustools/bustools [2024-08-20 15:25:04,811] DEBUG [main] Creating /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp directory [2024-08-20 15:25:04,837] DEBUG [main] Namespace(list=False, command='count', tmp=None, keep_tmp=True, verbose=True, i='/home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/index.idx', g='/home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/t2g.txt', x='10xv3', o='/home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/', num=False, w=None, r=None, t=8, m='4G', strand=None, inleaved=False, genomebam=False, aa=False, gtf=None, chromosomes=None, workflow='lamanno', em=False, mm=False, tcc=False, filter=None, filter_threshold=None, c1='/home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/cdna.txt', c2='/home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/nascent.txt', overwrite=False, dry_run=False, batch_barcodes=False, loom=False, h5ad=True, loom_names='barcode,target_name', sum='none', cellranger=False, gene_names=False, N=None, report=False, no_inspect=False, kallisto='/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/bins/linux/kallisto/kallisto', bustools='/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/bins/linux/bustools/bustools', no_validate=False, no_fragment=False, parity=None, fragment_l=None, fragment_s=None, bootstraps=None, matrix_to_files=False, matrix_to_directories=False, fastqs=['/home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R1_001.fastq.gz', '/home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R2_001.fastq.gz']) [2024-08-20 15:25:07,782] INFO [count_lamanno] Using index /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/index.idx to generate BUS file to /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/ from [2024-08-20 15:25:07,783] INFO [count_lamanno] /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R1_001.fastq.gz [2024-08-20 15:25:07,783] INFO [count_lamanno] /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R2_001.fastq.gz [2024-08-20 15:25:07,783] DEBUG [count_lamanno] kallisto bus -i /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/index.idx -o /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/ -x 10xv3 -t 8 /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R1_001.fastq.gz /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R2_001.fastq.gz [2024-08-20 15:25:07,884] DEBUG [count_lamanno] [2024-08-20 15:25:07,885] DEBUG [count_lamanno] [bus] Note: Strand option was not specified; setting it to --fr-stranded for specified technology [2024-08-20 15:26:37,312] DEBUG [count_lamanno] [index] k-mer length: 31 [2024-08-20 15:28:01,432] DEBUG [count_lamanno] [index] number of targets: 267,211 [2024-08-20 15:28:01,632] DEBUG [count_lamanno] [index] number of k-mers: 1,563,276,810 [2024-08-20 15:28:01,633] DEBUG [count_lamanno] [index] number of D-list k-mers: 10,106,781 [2024-08-20 15:28:01,633] DEBUG [count_lamanno] [quant] will process sample 1: /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R1_001.fastq.gz [2024-08-20 15:28:01,633] DEBUG [count_lamanno] /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/1/bamtofastq_S1_L002_R2_001.fastq.gz [2024-08-20 15:28:07,141] DEBUG [count_lamanno] [quant] finding pseudoalignments for the reads ... [2024-08-20 15:28:11,948] DEBUG [count_lamanno] [progress] 1M reads processed (0.2% mapped) [2024-08-20 15:28:16,556] DEBUG [count_lamanno] [progress] 2M reads processed (0.2% mapped) [2024-08-20 15:28:21,163] DEBUG [count_lamanno] [progress] 3M reads processed (0.2% mapped) [2024-08-20 15:28:25,971] DEBUG [count_lamanno] [progress] 4M reads processed (0.2% mapped) [2024-08-20 15:28:28,976] DEBUG [count_lamanno] [progress] 5M reads processed (0.2% mapped) [2024-08-20 15:28:33,584] DEBUG [count_lamanno] [progress] 6M reads processed (0.2% mapped) [2024-08-20 15:28:38,293] DEBUG [count_lamanno] [progress] 7M reads processed (0.2% mapped) [2024-08-20 15:28:42,900] DEBUG [count_lamanno] [progress] 8M reads processed (0.2% mapped) [2024-08-20 15:28:47,107] DEBUG [count_lamanno] [progress] 9M reads processed (0.2% mapped) [2024-08-20 15:28:47,408] DEBUG [count_lamanno] [progress] 10M reads processed (0.2% mapped) done [2024-08-20 15:28:47,408] DEBUG [count_lamanno] [quant] processed 10,661,723 reads, 23,080 reads pseudoaligned [2024-08-20 15:28:47,909] DEBUG [count_lamanno] [2024-08-20 15:29:08,438] INFO [count_lamanno] Sorting BUS file /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/output.bus to /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp/output.s.bus [2024-08-20 15:29:08,438] DEBUG [count_lamanno] bustools sort -o /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp/output.s.bus -T /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp -t 8 -m 4G /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/output.bus [2024-08-20 15:29:11,344] DEBUG [count_lamanno] all fits in buffer [2024-08-20 15:29:11,645] DEBUG [count_lamanno] Read in 23080 BUS records [2024-08-20 15:29:11,645] DEBUG [count_lamanno] reading time 0.0009s [2024-08-20 15:29:11,645] DEBUG [count_lamanno] sorting time 0.00229s [2024-08-20 15:29:11,645] DEBUG [count_lamanno] writing time 0.000906s [2024-08-20 15:29:12,746] INFO [count_lamanno] On-list not provided [2024-08-20 15:29:12,746] INFO [count_lamanno] Copying pre-packaged 10XV3 on-list to /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/ [2024-08-20 15:29:14,885] INFO [count_lamanno] Inspecting BUS file /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp/output.s.bus [2024-08-20 15:29:14,887] DEBUG [count_lamanno] bustools inspect -o /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/inspect.json -w /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/10x_version3_whitelist.txt /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp/output.s.bus [2024-08-20 15:29:20,303] INFO [count_lamanno] Correcting BUS records in /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp/output.s.bus to /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp/output.s.c.bus with on-list /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/10x_version3_whitelist.txt [2024-08-20 15:29:20,303] DEBUG [count_lamanno] bustools correct -o /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp/output.s.c.bus -w /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/10x_version3_whitelist.txt /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp/output.s.bus [2024-08-20 15:29:27,416] DEBUG [count_lamanno] Found 6794880 barcodes in the on-list [2024-08-20 15:29:30,221] DEBUG [count_lamanno] Processed 20997 BUS records [2024-08-20 15:29:30,222] DEBUG [count_lamanno] In on-list = 20719 [2024-08-20 15:29:30,222] DEBUG [count_lamanno] Corrected = 65 [2024-08-20 15:29:30,222] DEBUG [count_lamanno] Uncorrected = 213 [2024-08-20 15:29:31,824] INFO [count_lamanno] Sorting BUS file /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp/output.s.c.bus to /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/output.unfiltered.bus [2024-08-20 15:29:31,824] DEBUG [count_lamanno] bustools sort -o /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/output.unfiltered.bus -T /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp -t 8 -m 4G /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp/output.s.c.bus [2024-08-20 15:29:34,531] DEBUG [count_lamanno] all fits in buffer [2024-08-20 15:29:35,832] DEBUG [count_lamanno] Read in 20784 BUS records [2024-08-20 15:29:35,832] DEBUG [count_lamanno] reading time 0.000465s [2024-08-20 15:29:35,832] DEBUG [count_lamanno] sorting time 0.001165s [2024-08-20 15:29:35,832] DEBUG [count_lamanno] writing time 0.000965s [2024-08-20 15:29:35,841] INFO [count_lamanno] Capturing records from BUS file /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/output.unfiltered.bus to /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp/spliced.bus with capture list /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/nascent.txt [2024-08-20 15:29:35,842] DEBUG [count_lamanno] bustools capture -o /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp/spliced.bus -c /home/kaushik/rdfmount/Kaushik/velocyto/fastq_outs/nascent.txt -e /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/matrix.ec -t /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/transcripts.txt --complement --transcripts /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/output.unfiltered.bus [2024-08-20 15:29:36,044] DEBUG [count_lamanno] Parsing transcripts .. done [2024-08-20 15:29:36,044] DEBUG [count_lamanno] Parsing ECs .. done [2024-08-20 15:29:36,144] DEBUG [count_lamanno] Parsing capture list .. done [2024-08-20 15:29:37,245] DEBUG [count_lamanno] Read in 20781 BUS records, wrote 0 BUS records [2024-08-20 15:29:38,354] ERROR [main] An exception occurred Traceback (most recent call last): File "/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/main.py", line 1618, in main COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir) File "/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/main.py", line 665, in parse_count count_velocity( File "/home/kaushik/anaconda3/lib/python3.11/site-packages/ngs_tools/logging.py", line 62, in inner return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/count.py", line 2480, in count_velocity capture_result = bustools_capture( ^^^^^^^^^^^^^^^^^ File "/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/validate.py", line 121, in inner validate(path) File "/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/validate.py", line 88, in validate VALIDATORSext File "/home/kaushik/anaconda3/lib/python3.11/site-packages/kb_python/validate.py", line 40, in validate_bus raise ValidateError(f'{path} has no BUS records') kb_python.validate.ValidateError: /home/kaushik/rdfmount/Kaushik/velocyto/kbcount/1/tmp/spliced.bus has no BUS records

Yenaled commented 3 weeks ago

Can you check what version of kb-python are you running? E.g. type kb --version

kaushik-roy-physics commented 3 weeks ago

Here is the version:

kb_python 0.28.2

Can you please tell me what might be the source of the problem? Is this a memory issue?

Yenaled commented 3 weeks ago

lamanno workflow isn’t maintained in newest version of kb-python. Either downgrade kb-python or follow the newest documentation of kb-python here: https://www.biorxiv.org/content/10.1101/2023.11.21.568164v2.full

kaushik-roy-physics commented 3 weeks ago

Thank you. It worked with --workflow=nac. You can mark this issue as closed.