pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
141 stars 24 forks source link

kb count error #207

Closed ScienceComputing closed 1 year ago

ScienceComputing commented 1 year ago

Describe the issue Greetings! After running the kb count code (see below) in my Jupyter notebook using the python 3.11.1 and kb 0.24.1, the RNA count matrix is unable to be generated. Does someone know how to address this issue? Thanks so much in advance!

What is the exact command that was run?

!kb count -i mouse_index.idx -g t2g.txt -x 10xv2 --h5ad -t 2 -m 32 \
SRR8599150_S1_L001_R1_001.fastq.gz SRR8599150_S1_L001_R2_001.fastq.gz -o ../result/h5ad

Command output (with --verbose flag)

[2023-07-08 13:06:45,365]    INFO Generating BUS file from
[2023-07-08 13:06:45,365]    INFO         SRR8599150_S1_L001_R1_001.fastq.gz
[2023-07-08 13:06:45,365]    INFO         SRR8599150_S1_L001_R2_001.fastq.gz
[2023-07-08 13:08:25,992]    INFO Sorting BUS file ../result/h5ad/output.bus to tmp/output.s.bus
[2023-07-08 13:08:36,372]    INFO Whitelist not provided
[2023-07-08 13:08:36,372]    INFO Copying pre-packaged 10XV2 whitelist to ../result/h5ad
[2023-07-08 13:08:36,401]    INFO Inspecting BUS file tmp/output.s.bus
[2023-07-08 13:08:37,912]    INFO Correcting BUS records in tmp/output.s.bus to tmp/output.s.c.bus with whitelist ../result/h5ad/10xv2_whitelist.txt
[2023-07-08 13:08:55,477]    INFO Sorting BUS file tmp/output.s.c.bus to ../result/h5ad/output.unfiltered.bus
[2023-07-08 13:09:05,527]    INFO Generating count matrix ../result/h5ad/counts_unfiltered/cells_x_genes from BUS file ../result/h5ad/output.unfiltered.bus
[2023-07-08 13:09:05,543]   ERROR An exception occurred
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/kb_python/main.py", line 476, in main
    COMMAND_TO_FUNCTION[args.command](args)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/kb_python/main.py", line 136, in parse_count
    count(
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/kb_python/count.py", line 448, in count
    count_result = bustools_count(
                   ^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/kb_python/count.py", line 178, in bustools_count
    run_executable(command)
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/kb_python/utils.py", line 114, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/kb_python/bins/darwin/bustools/bustools count -o ../result/h5ad/counts_unfiltered/cells_x_genes -g t2g.txt -e ../result/h5ad/matrix.ec -t ../result/h5ad/transcripts.txt --genecounts ../result/h5ad/output.unfiltered.bus' returned non-zero exit status 1.
Yenaled commented 1 year ago

You are using an old version of kb-python (0.27.3 is the current version). I recommend upgrading.

Nonetheless, aside from that, there are a few things that might be useful to help me figure out what's going on:

First, try looking at the json files in your ../result/h5ad/ directory and let me know what you find (e.g. how many reads are getting mapped, how many barcodes are retained, etc.).

Second, make sure your t2g.txt file is correct (i.e. was the same one generated from kb ref when you made the mouse_index.idx file).

Third, you can manually inspect the BUS file which stores the mapped reads information by calling bustools text on it, e.g.

/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/kb_python/bins/darwin/bustools/bustools text -p ../result/h5ad/output.unfiltered.bus

And you can see if the records in that file look ok.

ScienceComputing commented 1 year ago

You are using an old version of kb-python (0.27.3 is the current version). I recommend upgrading.

Nonetheless, aside from that, there are a few things that might be useful to help me figure out what's going on:

First, try looking at the json files in your ../result/h5ad/ directory and let me know what you find (e.g. how many reads are getting mapped, how many barcodes are retained, etc.).

Second, make sure your t2g.txt file is correct (i.e. was the same one generated from kb ref when you made the mouse_index.idx file).

Third, you can manually inspect the BUS file which stores the mapped reads information by calling bustools text on it, e.g.

/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/kb_python/bins/darwin/bustools/bustools text -p ../result/h5ad/output.unfiltered.bus

And you can see if the records in that file look ok.

Greatly appreciate your expertise! I accidentally wiped out the old json files, but after re-downloading the t2g file and rerunning the kb count command, everything goes right with an expected cells_x_genes.mtx. Many thanks for your help.