pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
154 stars 23 forks source link

Error while sorting #62

Closed blogeman closed 4 years ago

blogeman commented 4 years ago

I am trying to run the R notebook Analysis of single-cell RNA-seq data: building and annotating an atlas tutorial locally on a PC running Windows 10. I'm having issues with the following code:

system("kb count -i index.idx -g t2g.txt -x 10xv3 -o output --filter bustools -t 12 pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz",intern=TRUE)

The count function appears to run normally for a few minutes and generates several output files, but then gives the following error when it comes time to sort:

[1] "[2020-03-24 14:03:32,044] INFO Generating BUS file from"
[2] "[2020-03-24 14:03:32,044] INFO pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz"
[3] "[2020-03-24 14:03:32,044] INFO pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz"
[4] "[2020-03-24 14:03:32,044] INFO pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz"
[5] "[2020-03-24 14:03:32,044] INFO pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz"
[6] "[2020-03-24 14:06:23,556] INFO Sorting BUS file output\output.bus to tmp\output.s.bus"
[7] "[2020-03-24 14:06:27,088] ERROR An exception occurred"
[8] "Traceback (most recent call last):"
[9] " File \"c:\users\brand\anaconda3\lib\site-packages\kb_python\main.py\", line 476, in main"
[10] " COMMAND_TO_FUNCTIONargs.command"
[11] " File \"c:\users\brand\anaconda3\lib\site-packages\kb_python\main.py\", line 148, in parse_count"
[12] " h5ad=args.h5ad,"
[13] " File \"c:\users\brand\anaconda3\lib\site-packages\kb_python\count.py\", line 418, in count"
[14] " memory=memory"
[15] " File \"c:\users\brand\anaconda3\lib\site-packages\kb_python\count.py\", line 94, in bustools_sort"
[16] " run_executable(command)"
[17] " File \"c:\users\brand\anaconda3\lib\site-packages\kb_python\utils.py\", line 114, in run_executable"
[18] " raise sp.CalledProcessError(p.returncode, ' '.join(command))"
[19] "subprocess.CalledProcessError: Command 'c:\users\brand\anaconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe sort -o tmp\output.s.bus -T tmp -t 12 -m 4G output\output.bus' returned non-zero exit status 3221225477."

An output folder is created and contains a 1.2 Gb output.bus file, but when I rerun:

C:\Users\brand\Documents\R\Kallisto.Bustools_test>c:\users\brand\anaconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe sort -o tmp\output.s.bus -T tmp -t 12 -m 4G output\output.bus

I get the following output:

Read in 0 BUS records

Lioscro commented 4 years ago

Could you try rerunning kb with the --verbose flag? That will print out debug logs which may be helpful for us to find the issue. Thanks.

blogeman commented 4 years ago

Now I realize I should have ran --verbose from the beginning, noob mistake. Below is the output:

system("kb count --verbose -i index.idx -g t2g.txt -x 10xv3 -o output --filter bustools -t 12 pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz",intern=TRUE) [1] "[2020-04-08 17:12:55,084] DEBUG Printing verbose output"
[2] "[2020-04-08 17:12:55,085] DEBUG Creating tmp directory"
[3] "[2020-04-08 17:12:55,086] DEBUG Namespace(c1=None, c2=None, command='count', fastqs=['pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz', 'pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz', 'pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz', 'pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz'], filter='bustools', g='t2g.txt', h5ad=False, i='index.idx', keep_tmp=False, lamanno=False, list=False, loom=False, m='4G', o='output', overwrite=False, t=12, verbose=True, w=None, x='10xv3')" [4] "[2020-04-08 17:12:55,086] INFO Generating BUS file from"
[5] "[2020-04-08 17:12:55,086] INFO pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz"
[6] "[2020-04-08 17:12:55,086] INFO pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz"
[7] "[2020-04-08 17:12:55,086] INFO pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz"
[8] "[2020-04-08 17:12:55,086] INFO pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz"
[9] "[2020-04-08 17:12:55,087] DEBUG c:\users\brand\anaconda3\lib\site-packages\kb_python\bins\windows\kallisto\kallisto.exe bus -i index.idx -o output -x 10xv3 -t 12 pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz"
[10] "[2020-04-08 17:15:11,671] DEBUG "
[11] "[2020-04-08 17:15:11,672] DEBUG [index] k-mer length: 31"
[12] "[2020-04-08 17:15:11,672] DEBUG [index] number of targets: 227,368"
[13] "[2020-04-08 17:15:11,672] DEBUG [index] number of k-mers: 140,125,185"
[14] "[2020-04-08 17:15:11,672] DEBUG [index] number of equivalence classes: 964,094"
[15] "[2020-04-08 17:15:11,672] DEBUG [quant] will process sample 1: pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz"
[16] "[2020-04-08 17:15:11,672] DEBUG pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz"
[17] "[2020-04-08 17:15:11,672] DEBUG [quant] will process sample 2: pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz"
[18] "[2020-04-08 17:15:11,672] DEBUG pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz"
[19] "[2020-04-08 17:15:11,672] DEBUG [quant] finding pseudoalignments for the reads ... done"
[20] "[2020-04-08 17:15:11,673] DEBUG [quant] processed 50,556,548 reads, 31,489,647 reads pseudoaligned"
[21] "[2020-04-08 17:15:11,682] INFO Sorting BUS file output\output.bus to tmp\output.s.bus"
[22] "[2020-04-08 17:15:11,683] DEBUG c:\users\brand\anaconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe sort -o tmp\output.s.bus -T tmp -t 12 -m 4G output\output.bus"
[23] "[2020-04-08 17:15:16,303] DEBUG Read in 0 BUS records"
[24] "[2020-04-08 17:15:16,304] ERROR An exception occurred"
[25] "Traceback (most recent call last):"
[26] " File \"c:\users\brand\anaconda3\lib\site-packages\kb_python\main.py\", line 476, in main"
[27] " COMMAND_TO_FUNCTIONargs.command"
[28] " File \"c:\users\brand\anaconda3\lib\site-packages\kb_python\main.py\", line 148, in parse_count"
[29] " h5ad=args.h5ad,"
[30] " File \"c:\users\brand\anaconda3\lib\site-packages\kb_python\count.py\", line 418, in count"
[31] " memory=memory"
[32] " File \"c:\users\brand\anaconda3\lib\site-packages\kb_python\count.py\", line 94, in bustools_sort"
[33] " run_executable(command)"
[34] " File \"c:\users\brand\anaconda3\lib\site-packages\kb_python\utils.py\", line 114, in run_executable"
[35] " raise sp.CalledProcessError(p.returncode, ' '.join(command))"
[36] "subprocess.CalledProcessError: Command 'c:\users\brand\anaconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe sort -o tmp\output.s.bus -T tmp -t 12 -m 4G output\output.bus' returned non-zero exit status 3221225477."
[37] "[2020-04-08 17:15:16,309] DEBUG Removing tmp directory"

Lioscro commented 4 years ago

I notice Read in 0 BUS records on line 23. I suspect output\output.bus may not contain any BUS records. Could you check how large this file is?

blogeman commented 4 years ago

The output directory contains the following 4 files:

output.bus = 961 Mb matrix.ec = 58 Mb run_info.json = 625 B transcripts.txt = 4 Mb

Lioscro commented 4 years ago

Hi, @blogeman, I'm still having difficulty trying to figure out what is going wrong. Could you perhaps try the development build of kb? You can install it by running the following command.

pip install git+https://github.com/pachterlab/kb_python@devel
bweatherbee commented 4 years ago

Hello, I am having the same exact issue. I am on windows ten, and am trying to run: kb count --verbose -i index.idx -g t2g.txt -x 10xv2 -t 4 -m 16G --filter bustools --h5ad 912_R1.fastq.gz 912_R2.fastq.gz 913_R1.fastq.gz 913_R2.fastq.gz on the command line. I get a large bus file output file of ~17GB as well as the other 3 expected files at a reasonable size. Sometimes I get a nonzero exit error of 255, but it does seem to consistently give a line of "Read 0 BUS files." I then get a pop-up message from windows saying "bustools.exe has stopped working." I tried the development build and it did not fix the problem. I also tried from a different windows computer and it did not fix the problem. I have tried on different sent of input fastq files and it did not fix the problem.

Lioscro commented 4 years ago

Hi, @blogeman and @bweatherbee, I've looked into this and could only replicate it when the process was allocated more memory than physically available on the machine. Even though physically a device may have 8G of RAM, providing -m 8G will cause a crash because kb is not the only process running at any given moment. I've contacted the Bustools developers about the issue. In the meantime, please try rerunning with reduced memory (perhaps -m 2G or lower for @blogeman and -m 8G for @bweatherbee). Please let me know if the problem persists.

bweatherbee commented 4 years ago

Hi @Lioscro, Reducing the allocated memory fixed the initial issue - I was able to get to the unfiltered count matrix. I did, though receive the same error at the "capture" step of the filtering protocols.

[2020-04-30 21:50:47,737] DEBUG c:\users\bailey\anaconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe capture -o tmp\output.filtered.bus -c .\filter_barcodes.txt -e .\matrix.ec -t .\transcripts.txt --barcode .\output.unfiltered.bus [2020-04-30 21:51:10,394] DEBUG Read in 33278593 BUS records, wrote 28223231 BUS records [2020-04-30 21:51:10,410] INFO Sorting BUS file tmp\output.filtered.bus to .\output.filtered.bus [2020-04-30 21:51:10,410] DEBUG c:\users\bailey\anaconda3\lib\site-packages\kb_python\bins\windows\bustools\bustools.exe sort -o .\output.filtered.bus -T tmp -t 4 -m 2G tmp\output.filtered.bus [2020-04-30 22:11:14,215] DEBUG Read in 0 BUS records [2020-04-30 22:11:14,215] ERROR An exception occurred

However, for my purposes, I will just manually filter out the empty droplets for now. Thank you for the help!

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days