pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
146 stars 23 forks source link

Crosspost from kallistobustools: kb count error with CSP reads #242

Closed kaijli closed 5 months ago

kaijli commented 7 months ago

Cross-posting from kallistobustools repo because I realized this repo has more activity:

I've run into an issue with running CSP data (TotalSeq C, adjacent to Chromium 10X protocol for GEX). The data is FASTQ files with reads of 26 base pairs (and a couple with some 27 bp reads). They were sequenced as 26x10x10x[25-27] bp configuration CSP libraries. I've been looking into running kb count and use the H5AD files with the GEX H5AD outputs.
My version of kbtools runs fine on GEX data but always produces the same error when trying to run CSP. I've tried changing the files, the parameters, reinstalling, etc.

Snippet from standard error

[2024-02-22 16:25:38,978]   DEBUG [main] Printing verbose output
[2024-02-22 16:25:41,189]   DEBUG [main] kallisto binary located at /home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/bins/linux/kallisto/kallisto
[2024-02-22 16:25:41,195]   DEBUG [main] bustools binary located at /home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/bins/linux/bustools/bustools
[2024-02-22 16:25:41,196]   DEBUG [main] Creating `/home/kb_out2/2023_03_15-samples_CSP/1S1_S1/tmp` directory
[2024-02-22 16:25:41,201]   DEBUG [main] Namespace(list=False, command='count', tmp=None, keep_tmp=False, verbose=True, i='/home/ref/kb_custom_index', g='/home/ref/kb_custom_t2g', x='10XV2', o='/home/kb_out2/2023_03_15-samples_CSP/1S1_S1', num=False, w=None, r=None, t=8, m='48G', strand='forward', inleaved=False, genomebam=False, aa=False, gtf=None, chromosomes=None, workflow='standard', em=False, mm=True, tcc=False, filter='bustools', filter_threshold=None, c1=None, c2=None, overwrite=False, dry_run=False, batch_barcodes=False, loom=False, h5ad=True, loom_names='barcode,target_name', sum='none', cellranger=False, gene_names=False, N=None, report=False, no_inspect=False, kallisto='/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/bins/linux/kallisto/kallisto', bustools='/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/bins/linux/bustools/bustools', no_validate=False, no_fragment=False, parity=None, fragment_l=None, fragment_s=None, bootstraps=None, matrix_to_files=False, matrix_to_directories=False, fastqs=['1S1_S1_L001_QC_001.1.trimmed.fastq', '1S1_S1_L001_QC_001.2.trimmed.fastq'])
[2024-02-22 16:25:45,302]    INFO [count] Using index /home/ref/kb_custom_index to generate BUS file to /home/kb_out2/2023_03_15-samples_CSP/1S1_S1 from
[2024-02-22 16:25:45,305]    INFO [count]         1S1_S1_L001_QC_001.1.trimmed.fastq
[2024-02-22 16:25:45,307]    INFO [count]         1S1_S1_L001_QC_001.2.trimmed.fastq
[2024-02-22 16:25:45,307]   DEBUG [count] kallisto bus -i /home/ref/kb_custom_index -o /home/kb_out2/2023_03_15-samples_CSP/1S1_S1 -x 10XV2 -t 8 --fr-stranded 1S1_S1_L001_QC_001.1.trimmed.fastq 1S1_S1_L001_QC_001.2.trimmed.fastq
[2024-02-22 16:25:45,410]   DEBUG [count] 
[2024-02-22 16:25:46,611]   ERROR [count] 
[2024-02-22 16:25:46,611]   ERROR [main] An exception occurred
Traceback (most recent call last):
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/main.py", line 1618, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/main.py", line 703, in parse_count
    count(
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/ngs_tools/logging.py", line 62, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/count.py", line 1277, in count
    bus_result = kallisto_bus(
                 ^^^^^^^^^^^^^
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/count.py", line 203, in kallisto_bus
    run_executable(command)
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/dry/__init__.py", line 25, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/utils.py", line 203, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/bins/linux/kallisto/kallisto bus -i /homeref/kb_custom_index -o /home/kb_out2/2023_03_15-samples_CSP/1S1_S1 -x 10XV2 -t 8 --fr-stranded 1S1_S1_L001_QC_001.1.trimmed.fastq 1S1_S1_L001_QC_001.2.trimmed.fastq' died with <Signals.SIGILL: 4>.
[2024-02-22 16:25:46,615]   DEBUG [main] Removing `/home/kb_out2/2023_03_15-samples_CSP/1S1_S1/tmp` directory

Snippet from srun file

#SBATCH --mem=64G
#SBATCH --partition=tb
#SBATCH --cpus-per-task=32

kb count \
-o /home/kb_out2/2023_03_15-samples_CSP/1S1_S1 \
-i /home/ref/kb_custom_index \
-g /home/ref/kb_custom_t2g \
-x 10XV2 \
-m 48G \
-t 8 \
--workflow standard \
--mm \
--h5ad \
--strand forward \
--filter bustools \
--verbose \
1S1_S1_L001_QC_001.1.trimmed.fastq 1S1_S1_L001_QC_001.2.trimmed.fastq

version information:

kb_python 0.28.0
kallisto: 0.50.1 (/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/bins/linux/kallisto/kallisto)
bustools: 0.43.1 (/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/bins/linux/bustools/bustools)
Yenaled commented 7 months ago

So the exact same command (and index) works with GEX? Can you verify that using your current installation of kb-python?

If so, then it must be something about the FASTQ file of the CSP data. Look at the first 10 reads or so of each file (and paste them here if you can).

kaijli commented 7 months ago

You're right, I get the same error trying to run GEX. The command kb ref still seems to work, but it errors with kb count even with GEX data. It's strange because when the error first occurred, it was in the middle of multiple GEX and CSP runs, and only the last 3 CSP runs errorred, so I assumed it was due to the data type.

This is the first 10 of my GEX

@VH00347:121:AAAY7HYHV:1:1101:19216:1000 1:N:0:CGTGACATGC+TTTAGACCAT
TAGTGGTTCCTTGCCACTACTGAGAA
+
C-CCCCCCCC;-CCC-C;;-CCCCCC
@VH00347:121:AAAY7HYHV:1:1101:19595:1000 1:N:0:CGTGACATGC+TTTAGACCAT
TCTGAGAAGCGATCCCAGCCTCGAGA
+
CCCCCCCCCCCCCCCCCC-C-CCCCC
@VH00347:121:AAAY7HYHV:1:1101:20201:1000 1:N:0:CGTGACATGC+TTTAGACCAT
AACCATGGTGAAATCAAGTGAGTGGG

and the subsequent error:

Traceback (most recent call last):
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/main.py", line 1618, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/main.py", line 703, in parse_count
    count(
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/ngs_tools/logging.py", line 62, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/count.py", line 1277, in count
    bus_result = kallisto_bus(
                 ^^^^^^^^^^^^^
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/count.py", line 203, in kallisto_bus
    run_executable(command)
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/dry/__init__.py", line 25, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/utils.py", line 203, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))

Here is also the first 10 of my CSP read.

@VH00347:124:AAAY5CHHV:1:1101:18951:1000 1:N:0:GTCGTTGCCT+AAGAAGTTCT
GTACGTAGTAGAAAGGAATTCCTATT
+
CC-C-CCCCCCC;CCCCC;CCCCCCC
@VH00347:124:AAAY5CHHV:1:1101:20087:1000 1:N:0:GTCGTTGCCT+AAGAAGTTCT
TAGTGGTAGATGAGAGTTATAGTCAA
+
C-CC-CCCC-C-CC-CCCCCCCCCCC
@VH00347:124:AAAY5CHHV:1:1101:21412:1000 1:N:0:GTCGTTGCCT+AAGAAGTTCT
ACCGTAAGTCAAAGATACCAAAACTC

Thanks!

Yenaled commented 7 months ago

One suggestion: Install kallisto from source and run your kb count via:

kb count --kallisto=/path/to/kallisto

Yenaled commented 7 months ago

To install from source:

git clone --branch v0.50.1 https://github.com/pachterlab/kallisto
cd kallisto
mkdir build
cd build
cmake ..
make
cd ../../

And then your binary will at: kallisto/build/src/kallisto

kaijli commented 7 months ago

I've installed kallisto from source. Do I need to do the same for bustools? I am able to run through kb ref without issue, but then kb count errors, saying that the indices are incompatible:

Error: incompatible indices. Found version 10, expected version 13

These two commands are run together, so I'm still very confused.

kb ref \
    --kallisto=/home/kli/kallisto/build/src/kallisto \
    -i ${indexFile} \
    -g ${t2gFile} \
    -d human \
    --overwrite \
    --workflow standard \
    --verbose \
    ${assemblyFile} \
    ${gtfFile}

kb count \
    --kallisto=/home/kli/kallisto/build/src/kallisto \
    -o ${outDir} \
    -i ${indexFile} \
    -g ${t2gFile} \
    -x 10XV2 \
    -m 48G \
    -t 8 \
    --workflow standard \
    --mm \
    --h5ad \
    --strand forward \
    --filter bustools \
    --verbose \
    ${fastqs}

I've verified that the kallisto built from source is the version you specified above. Should I be trying to build the reference just using kallisto index instead of kb ref?

kaijli commented 7 months ago

I somehow made the index error go away, but I still end up with the same kallisto error:

[2024-02-28 12:30:44,720]   ERROR [main] An exception occurred
Traceback (most recent call last):
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/main.py", line 1618, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/main.py", line 703, in parse_count
    count(
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/ngs_tools/logging.py", line 62, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/count.py", line 1277, in count
    bus_result = kallisto_bus(
                 ^^^^^^^^^^^^^
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/count.py", line 203, in kallisto_bus
    run_executable(command)
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/dry/__init__.py", line 25, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/utils.py", line 203, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))

It's correctly calling the kallisto version. DEBUG [main] kallisto binary located at /home/kli/kallisto/build/src/kallisto

Yenaled commented 7 months ago

Can you run kb count with --verbose and paste the full error message? Otherwise, I can't see where kallisto is failing.

kaijli commented 7 months ago

Yes, this is an output from --verbose with a couple of paths shortened

[2024-02-28 12:30:46,920]   DEBUG [main] Printing verbose output
[2024-02-28 12:30:49,131]   DEBUG [main] kallisto binary located at /home/kli/kallisto/build/src/kallisto
[2024-02-28 12:30:49,131]   DEBUG [main] bustools binary located at /home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/bins/linux/bustools/bustools
[2024-02-28 12:30:49,131]   DEBUG [main] Creating `kb_out2/GEX_test/D3_S1/tmp` directory
[2024-02-28 12:30:49,140]   DEBUG [main] Namespace(list=False, command='count', tmp=None, keep_tmp=False, verbose=True, i='ref/kb_custom_index', g='ref/kb_custom_t2g', x='10XV2', o='kb_out2/GEX_test/D3_S1', num=False, w=None, r=None, t=8, m='48G', strand='forward', inleaved=False, genomebam=False, aa=False, gtf=None, chromosomes=None, workflow='standard', em=False, mm=True, tcc=False, filter='bustools', filter_threshold=None, c1=None, c2=None, overwrite=False, dry_run=False, batch_barcodes=False, loom=False, h5ad=True, loom_names='barcode,target_name', sum='none', cellranger=False, gene_names=False, N=None, report=False, no_inspect=False, kallisto='/home/kli/kallisto/build/src/kallisto', bustools='/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/bins/linux/bustools/bustools', no_validate=False, no_fragment=False, parity=None, fragment_l=None, fragment_s=None, bootstraps=None, matrix_to_files=False, matrix_to_directories=False, fastqs=['D3_S1_L001_QC_001.1.trimmed.fastq', 'D3_S1_L001_QC_001.2.trimmed.fastq'])
[2024-02-28 12:30:52,614]    INFO [count] Using index ref/kb_custom_index to generate BUS file to kb_out2/GEX_test/D3_S1 from
[2024-02-28 12:30:52,615]    INFO [count]         D3_S1_L001_QC_001.1.trimmed.fastq
[2024-02-28 12:30:52,615]    INFO [count]         D3_S1_L001_QC_001.2.trimmed.fastq
[2024-02-28 12:30:52,615]   DEBUG [count] kallisto bus -i ref/kb_custom_index -o kb_out2/GEX_test/D3_S1 -x 10XV2 -t 8 --fr-stranded D3_S1_L001_QC_001.1.trimmed.fastq D3_S1_L001_QC_001.2.trimmed.fastq
[2024-02-28 12:30:52,719]   DEBUG [count] 
[2024-02-28 12:30:54,121]   ERROR [count] 
[2024-02-28 12:30:54,121]   ERROR [main] An exception occurred
Traceback (most recent call last):
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/main.py", line 1618, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/main.py", line 703, in parse_count
    count(
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/ngs_tools/logging.py", line 62, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/count.py", line 1277, in count
    bus_result = kallisto_bus(
                 ^^^^^^^^^^^^^
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/count.py", line 203, in kallisto_bus
    run_executable(command)
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/dry/__init__.py", line 25, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kli/.conda/envs/kallisto/lib/python3.11/site-packages/kb_python/utils.py", line 203, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
Yenaled commented 7 months ago

Very strange — there are no messages after the lines you posted?

Otherwise, try running that “kallisto bus” command shown in your output, using your kallisto binary path.

kaijli commented 7 months ago

The only thing after the error message I posted above is the output from /usr/bin/time -v

Command being timed: "kb count --kallisto=/home/kli/kallisto/build/src/kallisto -o kb_out2/GEX_test/D10_S10 -i ref/kb_custom_index -g ref/kb_custom_t2g -x 10XV2 -m 48G -t 8 --workflow standard --mm --h5ad --strand forward --filter bustools --verbose D10_S10_L001_QC_001.1.trimmed.fastq D10_S10_L001_QC_001.2.trimmed.fastq"
    User time (seconds): 2.84
    System time (seconds): 1.44
    Percent of CPU this job got: 45%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:09.50
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 170424
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 6
    Minor (reclaiming a frame) page faults: 96819
    Voluntary context switches: 20723
    Involuntary context switches: 18
    Swaps: 0
    File system inputs: 2512
    File system outputs: 40
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

I ran the kallisto bus command in an active terminal session.

/home/kli/kallisto/build/src/kallisto bus -i ref/kb_custom_index -o kb_out2/GEX_test/D3_S1 -x 10XV2 -t 8 --fr-stranded D3_S1_L001_QC_001.1.trimmed.fastq D3_S1_L001_QC_001.2.trimmed.fastq

[index] k-mer length: 31
[index] number of targets: 227,665
[index] number of k-mers: 139,900,295
[index] number of D-list k-mers: 5,477,475
[quant] will process sample 1: D3_S1_L001_QC_001.1.trimmed.fastq
                               D3_S1_L001_QC_001.2.trimmed.fastq
[progress] 249M reads processed (5.2% mapped)              done
[quant] processed 250,248,504 reads, 12,911,502 reads pseudoaligned

It ran to completion and ouput files matrix.ec, output.bus, run_info.json, and transcripts.txt. Does this mean there's something wrong with my version of kb_python?

Yenaled commented 7 months ago

Try pip installing the latest version of kb_python (which is 0.28.2).

Only other thing I can think of is a buggy python installation.

To be honest though, I've never seen something work in a terminal session but not under kb-python (literally all kb-python does is execute that exact same command).

Yenaled commented 7 months ago

I'd also just say try it on a different computer/server and see if the result reproduces there (I suspect it won't). It only takes a couple of gigabytes of RAM so you can just process it on your own personal computer.

kaijli commented 7 months ago

Can I get instructions on how to install from source? I don't have access to pip on my HPC and I don't entirely trust conda. I tried installing into an environment I've been using but I think there's something messed up about it when I was also trying to run the kb-python tutorials with jupyter notebook (not all the versions line up for scanpy and the packages called). I'll try with a fresh conda environment, but I also want to try having kallisto, bustools, and kb-python installed from source. I have a large dataset so it'd be great it I could get it on the system I'm already troubleshooting with.

Yenaled commented 7 months ago

See the above to install kallisto from source -- for kb_python, you have to stick with the pip install b/c that's how it's distributed.

There are also ways to use pip to install stuff locally (it has been a while since I've done it, but it's doable -- you just have to set certain paths/variables and do pip install kb_python --user).

Otherwise, you can run the kallisto bus command to generate the files and run kb count afterwards (kb count will auto-detect the BUS file and proceed from there without rerunning kallisto bus).

kaijli commented 7 months ago

I ended up making a new conda environment with only kb_python and python3.11. I managed to get GEX to run without error, but CSP is still errorring with the kb count commands. This is the standard error for the CSP run


[2024-03-05 11:03:56,485]   DEBUG Printing verbose output
[2024-03-05 11:03:56,485]   DEBUG kallisto binary located at /home/kli/.conda/envs/kb/bin/kallisto
[2024-03-05 11:03:56,486]   DEBUG bustools binary located at /home/kli/.conda/envs/kb/bin/bustools
[2024-03-05 11:03:56,486]   DEBUG Creating /home/kli/kb_out2/CSP_test/D10_S10/tmp directory
[2024-03-05 11:03:56,491]   DEBUG Namespace(list=False, command='count', tmp=None, keep_tmp=False, verbose=True, kallisto='/home/kli/.conda/envs/kb/bin/kallisto', bustools='bustools', i='/home/kli/ref/kb_custom_index', g='/home/kli/ref/kb_custom_t2g', x='10XV2', o='/home/kli/kb_out2/CSP_test/D10_S10', w=None, t=8, m='48G', workflow='standard', mm=True, tcc=False, filter='bustools', c1=None, c2=None, overwrite=True, dry_run=False, lamanno=False, nucleus=False, loom=False, h5ad=True, cellranger=False, report=False, no_inspect=False, no_validate=False, fastqs=['D10_S10_L001_QC_001.1.trimmed.fastq', 'D10_S10_L001_QC_001.2.trimmed.fastq'])
[2024-03-05 11:03:56,505]    INFO Using index /home/kli/ref/kb_custom_index to generate BUS file to /home/kli/kb_out2/CSP_test/D10_S10 from
[2024-03-05 11:03:56,505]    INFO         D10_S10_L001_QC_001.1.trimmed.fastq
[2024-03-05 11:03:56,505]    INFO         D10_S10_L001_QC_001.2.trimmed.fastq
[2024-03-05 11:03:56,505]   DEBUG kallisto bus -i /home/kli/ref/kb_custom_index -o /home/kli/kb_out2/CSP_test/D10_S10 -x 10XV2 -t 8 D10_S10_L001_QC_001.1.trimmed.fastq D10_S10_L001_QC_001.2.trimmed.fastq
[2024-03-05 11:06:44,859]   DEBUG 
[2024-03-05 11:06:44,862]   DEBUG [index] k-mer length: 31
[2024-03-05 11:06:44,863]   DEBUG [index] number of targets: 227,368
[2024-03-05 11:06:44,863]   DEBUG [index] number of k-mers: 140,125,185
[2024-03-05 11:06:44,863]   DEBUG [index] number of equivalence classes: 964,094
[2024-03-05 11:06:44,863]   DEBUG [quant] will process sample 1: D10_S10_L001_QC_001.1.trimmed.fastq
[2024-03-05 11:06:44,863]   DEBUG D10_S10_L001_QC_001.2.trimmed.fastq
[2024-03-05 11:06:44,863]   DEBUG [quant] finding pseudoalignments for the reads ... done
[2024-03-05 11:06:44,863]   DEBUG [quant] processed 72,208,578 reads, 0 reads pseudoaligned
[2024-03-05 11:06:44,863]   DEBUG [~warn] no reads pseudoaligned.
[2024-03-05 11:06:44,863]   DEBUG 
[2024-03-05 11:06:44,863]   ERROR 
[index] k-mer length: 31
[index] number of targets: 227,368
[index] number of k-mers: 140,125,185
[index] number of equivalence classes: 964,094
[quant] will process sample 1: D10_S10_L001_QC_001.1.trimmed.fastq
D10_S10_L001_QC_001.2.trimmed.fastq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 72,208,578 reads, 0 reads pseudoaligned
[~warn] no reads pseudoaligned.

[2024-03-05 11:06:44,864]   ERROR An exception occurred
Traceback (most recent call last):
  File "/home/kli/.conda/envs/kb/lib/python3.11/site-packages/kb_python/main.py", line 866, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/home/kli/.conda/envs/kb/lib/python3.11/site-packages/kb_python/main.py", line 314, in parse_count
    count(
  File "/home/kli/.conda/envs/kb/lib/python3.11/site-packages/kb_python/count.py", line 1123, in count
    bus_result = kallisto_bus(
                 ^^^^^^^^^^^^^
  File "/home/kli/.conda/envs/kb/lib/python3.11/site-packages/kb_python/validate.py", line 112, in inner
    results = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kli/.conda/envs/kb/lib/python3.11/site-packages/kb_python/count.py", line 149, in kallisto_bus
    run_executable(command)
  File "/home/kli/.conda/envs/kb/lib/python3.11/site-packages/kb_python/dry/__init__.py", line 24, in inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/kli/.conda/envs/kb/lib/python3.11/site-packages/kb_python/utils.py", line 237, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/home/kli/.conda/envs/kb/bin/kallisto bus -i /home/kli/ref/kb_custom_index -o /home/kli/kb_out2/CSP_test/D10_S10 -x 10XV2 -t 8 D10_S10_L001_QC_001.1.trimmed.fastq D10_S10_L001_QC_001.2.trimmed.fastq' returned non-zero exit status 1.
[2024-03-05 11:06:44,867]   DEBUG Removing /home/kli/kb_out2/CSP_test/D10_S10/tmp directory
    Command being timed: "/home/kli/.conda/envs/kb/bin/kb count --kallisto=/home/kli/.conda/envs/kb/bin/kallisto -o /home/kli/kb_out2/CSP_test/D10_S10 -i /home/kli/ref/kb_custom_index -g /home/kli/ref/kb_custom_t2g -x 10XV2 -m 48G -t 8 --workflow standard --overwrite --mm --h5ad --filter bustools --verbose D10_S10_L001_QC_001.1.trimmed.fastq D10_S10_L001_QC_001.2.trimmed.fastq"
    User time (seconds): 142.84
    System time (seconds): 38.31
    Percent of CPU this job got: 105%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 2:51.33
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 7106372
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 265938
    Voluntary context switches: 654
    Involuntary context switches: 411
    Swaps: 0
    File system inputs: 0
    File system outputs: 106080
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

I was able to run kallisto bus -i /home/kli/ref/kb_custom_index -o /home/kli/kb_out2/CSP_test/D10_S10 -x 10XV2 -t 8 D10_S10_L001_QC_001.1.trimmed.fastq D10_S10_L001_QC_001.2.trimmed.fastq in the command line, and it stops at the same point it does in the standard error above.

It also exits my loop after one set of files, so I'm not entirely sure what's happening there. There's no difference between the code run for GEX and CSP. This is standard out for one of the looped GEX runs (both runs are without ref, which runs fine)


[2024-03-05 10:16:19,652]   DEBUG Printing verbose output
[2024-03-05 10:16:19,652]   DEBUG kallisto binary located at /home/kli/.conda/envs/kb/bin/kallisto
[2024-03-05 10:16:19,652]   DEBUG bustools binary located at /home/kli/.conda/envs/kb/bin/bustools
[2024-03-05 10:16:19,653]   DEBUG Creating /home/kli/kb_out2/GEX_test/1S1_S1/tmp directory
[2024-03-05 10:16:19,656]   DEBUG Namespace(list=False, command='count', tmp=None, keep_tmp=False, verbose=True, kallisto='/home/kli/.conda/envs/kb/bin/kallisto', bustools='bustools', i='/home/kli/ref/kb_custom_index', g='/home/kli/ref/kb_custom_t2g', x='10XV2', o='/home/kli/kb_out2/GEX_test/1S1_S1', w=None, t=8, m='48G', workflow='standard', mm=True, tcc=False, filter='bustools', c1=None, c2=None, overwrite=True, dry_run=False, lamanno=False, nucleus=False, loom=False, h5ad=True, cellranger=False, report=False, no_inspect=False, no_validate=False, fastqs=['1S1_S1_L001_QC_001.1.trimmed.fastq', '1S1_S1_L001_QC_001.2.trimmed.fastq'])
[2024-03-05 10:16:19,668]    INFO Using index /home/kli/ref/kb_custom_index to generate BUS file to /home/kli/kb_out2/GEX_test/1S1_S1 from
[2024-03-05 10:16:19,668]    INFO         1S1_S1_L001_QC_001.1.trimmed.fastq
[2024-03-05 10:16:19,668]    INFO         1S1_S1_L001_QC_001.2.trimmed.fastq
[2024-03-05 10:16:19,668]   DEBUG kallisto bus -i /home/kli/ref/kb_custom_index -o /home/kli/kb_out2/GEX_test/1S1_S1 -x 10XV2 -t 8 1S1_S1_L001_QC_001.1.trimmed.fastq 1S1_S1_L001_QC_001.2.trimmed.fastq
[2024-03-05 10:24:13,164]   DEBUG 
[2024-03-05 10:24:13,165]   DEBUG [index] k-mer length: 31
[2024-03-05 10:24:13,165]   DEBUG [index] number of targets: 227,368
[2024-03-05 10:24:13,165]   DEBUG [index] number of k-mers: 140,125,185
[2024-03-05 10:24:13,165]   DEBUG [index] number of equivalence classes: 964,094
[2024-03-05 10:24:13,165]   DEBUG [quant] will process sample 1: 1S1_S1_L001_QC_001.1.trimmed.fastq
[2024-03-05 10:24:13,165]   DEBUG 1S1_S1_L001_QC_001.2.trimmed.fastq
[2024-03-05 10:24:13,165]   DEBUG [quant] finding pseudoalignments for the reads ... done
[2024-03-05 10:24:13,165]   DEBUG [quant] processed 280,432,089 reads, 124,368,682 reads pseudoaligned
[2024-03-05 10:24:13,165]   DEBUG 
[2024-03-05 10:24:44,190]   DEBUG /home/kli/kb_out2/GEX_test/1S1_S1/output.bus passed validation
[2024-03-05 10:24:44,194]    INFO Sorting BUS file /home/kli/kb_out2/GEX_test/1S1_S1/output.bus to /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.s.bus
[2024-03-05 10:24:44,194]   DEBUG bustools sort -o /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.s.bus -T /home/kli/kb_out2/GEX_test/1S1_S1/tmp -t 8 -m 48G /home/kli/kb_out2/GEX_test/1S1_S1/output.bus
[2024-03-05 10:26:23,478]   DEBUG Read in 124368682 BUS records
[2024-03-05 10:26:28,024]   DEBUG /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.s.bus passed validation
[2024-03-05 10:26:28,027]    INFO Whitelist not provided
[2024-03-05 10:26:28,027]    INFO Copying pre-packaged 10XV2 whitelist to /home/kli/kb_out2/GEX_test/1S1_S1
[2024-03-05 10:26:28,410]    INFO Inspecting BUS file /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.s.bus
[2024-03-05 10:26:28,410]   DEBUG bustools inspect -o /home/kli/kb_out2/GEX_test/1S1_S1/inspect.json -w /home/kli/kb_out2/GEX_test/1S1_S1/10xv2_whitelist.txt -e /home/kli/kb_out2/GEX_test/1S1_S1/matrix.ec /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.s.bus
[2024-03-05 10:26:42,935]    INFO Correcting BUS records in /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.s.bus to /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.s.c.bus with whitelist /home/kli/kb_out2/GEX_test/1S1_S1/10xv2_whitelist.txt
[2024-03-05 10:26:42,935]   DEBUG bustools correct -o /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.s.c.bus -w /home/kli/kb_out2/GEX_test/1S1_S1/10xv2_whitelist.txt /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.s.bus
[2024-03-05 10:26:57,962]   DEBUG Found 737280 barcodes in the whitelist
[2024-03-05 10:26:57,962]   DEBUG Processed 43135112 BUS records
[2024-03-05 10:26:57,962]   DEBUG In whitelist = 39393962
[2024-03-05 10:26:57,962]   DEBUG Corrected    = 1047370
[2024-03-05 10:26:57,962]   DEBUG Uncorrected  = 2693780
[2024-03-05 10:27:02,054]   DEBUG /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.s.c.bus passed validation
[2024-03-05 10:27:02,055]    INFO Sorting BUS file /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.s.c.bus to /home/kli/kb_out2/GEX_test/1S1_S1/output.unfiltered.bus
[2024-03-05 10:27:02,055]   DEBUG bustools sort -o /home/kli/kb_out2/GEX_test/1S1_S1/output.unfiltered.bus -T /home/kli/kb_out2/GEX_test/1S1_S1/tmp -t 8 -m 48G /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.s.c.bus
[2024-03-05 10:28:16,702]   DEBUG Read in 40441332 BUS records
[2024-03-05 10:28:20,742]   DEBUG /home/kli/kb_out2/GEX_test/1S1_S1/output.unfiltered.bus passed validation
[2024-03-05 10:28:20,745]    INFO Generating count matrix /home/kli/kb_out2/GEX_test/1S1_S1/counts_unfiltered/cells_x_genes from BUS file /home/kli/kb_out2/GEX_test/1S1_S1/output.unfiltered.bus
[2024-03-05 10:28:20,745]   DEBUG bustools count -o /home/kli/kb_out2/GEX_test/1S1_S1/counts_unfiltered/cells_x_genes -g /home/kli/ref/kb_custom_t2g -e /home/kli/kb_out2/GEX_test/1S1_S1/matrix.ec -t /home/kli/kb_out2/GEX_test/1S1_S1/transcripts.txt --genecounts --multimapping /home/kli/kb_out2/GEX_test/1S1_S1/output.unfiltered.bus
[2024-03-05 10:29:03,065]   DEBUG /home/kli/kb_out2/GEX_test/1S1_S1/counts_unfiltered/cells_x_genes.mtx passed validation
[2024-03-05 10:29:03,066]    INFO Reading matrix /home/kli/kb_out2/GEX_test/1S1_S1/counts_unfiltered/cells_x_genes.mtx
[2024-03-05 10:29:26,772]    INFO Writing matrix to h5ad /home/kli/kb_out2/GEX_test/1S1_S1/counts_unfiltered/adata.h5ad
[2024-03-05 10:29:28,824]    INFO Filtering with bustools
[2024-03-05 10:29:28,824]    INFO Generating whitelist /home/kli/kb_out2/GEX_test/1S1_S1/filter_barcodes.txt from BUS file /home/kli/kb_out2/GEX_test/1S1_S1/output.unfiltered.bus
[2024-03-05 10:29:28,824]   DEBUG bustools whitelist -o /home/kli/kb_out2/GEX_test/1S1_S1/filter_barcodes.txt /home/kli/kb_out2/GEX_test/1S1_S1/output.unfiltered.bus
[2024-03-05 10:29:29,677]   DEBUG Read in 40019744 BUS records, wrote 13714 barcodes to whitelist with threshold 1436
[2024-03-05 10:29:29,678]    INFO Correcting BUS records in /home/kli/kb_out2/GEX_test/1S1_S1/output.unfiltered.bus to /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.unfiltered.c.bus with whitelist /home/kli/kb_out2/GEX_test/1S1_S1/filter_barcodes.txt
[2024-03-05 10:29:29,678]   DEBUG bustools correct -o /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.unfiltered.c.bus -w /home/kli/kb_out2/GEX_test/1S1_S1/filter_barcodes.txt /home/kli/kb_out2/GEX_test/1S1_S1/output.unfiltered.bus
[2024-03-05 10:29:45,151]   DEBUG Found 13713 barcodes in the whitelist
[2024-03-05 10:29:45,152]   DEBUG Processed 40019744 BUS records
[2024-03-05 10:29:45,152]   DEBUG In whitelist = 29840659
[2024-03-05 10:29:45,152]   DEBUG Corrected    = 0
[2024-03-05 10:29:45,152]   DEBUG Uncorrected  = 10179085
[2024-03-05 10:29:47,855]   DEBUG /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.unfiltered.c.bus passed validation
[2024-03-05 10:29:47,855]    INFO Sorting BUS file /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.unfiltered.c.bus to /home/kli/kb_out2/GEX_test/1S1_S1/output.filtered.bus
[2024-03-05 10:29:47,856]   DEBUG bustools sort -o /home/kli/kb_out2/GEX_test/1S1_S1/output.filtered.bus -T /home/kli/kb_out2/GEX_test/1S1_S1/tmp -t 8 -m 48G /home/kli/kb_out2/GEX_test/1S1_S1/tmp/output.unfiltered.c.bus
[2024-03-05 10:30:55,975]   DEBUG Read in 29840659 BUS records
[2024-03-05 10:30:58,905]   DEBUG /home/kli/kb_out2/GEX_test/1S1_S1/output.filtered.bus passed validation
[2024-03-05 10:30:58,908]    INFO Generating count matrix /home/kli/kb_out2/GEX_test/1S1_S1/counts_filtered/cells_x_genes from BUS file /home/kli/kb_out2/GEX_test/1S1_S1/output.filtered.bus
[2024-03-05 10:30:58,908]   DEBUG bustools count -o /home/kli/kb_out2/GEX_test/1S1_S1/counts_filtered/cells_x_genes -g /home/kli/ref/kb_custom_t2g -e /home/kli/kb_out2/GEX_test/1S1_S1/matrix.ec -t /home/kli/kb_out2/GEX_test/1S1_S1/transcripts.txt --genecounts /home/kli/kb_out2/GEX_test/1S1_S1/output.filtered.bus
[2024-03-05 10:31:22,612]   DEBUG /home/kli/kb_out2/GEX_test/1S1_S1/counts_filtered/cells_x_genes.mtx passed validation
[2024-03-05 10:31:22,612]    INFO Reading matrix /home/kli/kb_out2/GEX_test/1S1_S1/counts_filtered/cells_x_genes.mtx
[2024-03-05 10:31:33,218]    INFO Writing matrix to h5ad /home/kli/kb_out2/GEX_test/1S1_S1/counts_filtered/adata.h5ad
[2024-03-05 10:31:34,336]   DEBUG Removing /home/kli/kb_out2/GEX_test/1S1_S1/tmp directory
    Command being timed: "/home/kli/.conda/envs/kb/bin/kb count --kallisto=/home/kli/.conda/envs/kb/bin/kallisto -o /home/kli/kb_out2/GEX_test/1S1_S1 -i /home/kli/ref/kb_custom_index -g /home/kli/ref/kb_custom_t2g -x 10XV2 -m 48G -t 8 --workflow standard --overwrite --mm --h5ad --filter bustools --verbose 1S1_S1_L001_QC_001.1.trimmed.fastq 1S1_S1_L001_QC_001.2.trimmed.fastq"
    User time (seconds): 2970.85
    System time (seconds): 346.22
    Percent of CPU this job got: 361%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 15:18.10
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 50334632
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 3
    Minor (reclaiming a frame) page faults: 38212739
    Voluntary context switches: 12850
    Involuntary context switches: 6437
    Swaps: 0
    File system inputs: 3984
    File system outputs: 20434480
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0
Yenaled commented 7 months ago

That's not an error -- It's simply that none of your reads are getting mapped to your index.

kaijli commented 7 months ago

I see. This is with 26/10/10/26 CSP data, is there a specific protocol I should be following to try and analyze it with kb-python?

Yenaled commented 7 months ago

I have never worked with CSP data and don't know what the reads look like or what they should align to or where the technical sequences (e.g. barcodes, if present) are supposed to be. But chances are, you're not mapping the correct portions of the reads (you're using 10xv2 which assumes 16 bp barcodes + 10 bp UMIs and all of R2 is the "mappable" sequence). I don't know CSP is supposed to be.

kaijli commented 7 months ago

The CSP reads are similar, with the first 26 (R1) being the 16 bp barcodes + 10 bp UMIs, and the second 26 (R2) being the "mappable" sequence. I have had difficulty finding software and documentation that supports CSP data, so any support is greatly appreciated!

Yenaled commented 7 months ago

Then 10xv2 should work but, again, I don’t know anything about the data because I don’t know the protocol and have never seen those fastq files. Kallisto will definitely work on it if you instruct it exactly how it’s supposed to be specified — but without documentation, it’s very hard to know what’s actually in your reads.

github-actions[bot] commented 6 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days