pachterlab / kb_python

A wrapper for the kallisto | bustools workflow for single-cell RNA-seq pre-processing
https://www.kallistobus.tools/
BSD 2-Clause "Simplified" License
154 stars 23 forks source link

Process multiple SRRs; BrokenPipe error #99

Closed mariafiruleva closed 3 years ago

mariafiruleva commented 3 years ago

Dear kallisto team,

Describe the issue I want to process a sample with several SRRs (for example, this one) using pipes. The bash syntax with curly braces was used in order to combine several links into one line. However, I've got a BrokenPipeError error. Any advice? Is my command right for my purpose?

What is the exact command that was run?

# version (installed via conda): kb_python 0.25.1
# prepare reference
kb ref -d mouse -i index.idx -g t2g.txt -f1 transcriptome.fasta
# run count command
kb count -i index.idx -g t2g.txt -x 10XV2 -o SRS7040866 --filter bustools -t 4 -m 40G --verbose ftp://{ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/068/SRR12264568/SRR12264568_1.fastq.gz,ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/069/SRR12264569/SRR12264569_1.fastq.gz} ftp://{ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/068/SRR12264568/SRR12264568_2.fastq.gz,ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/069/SRR12264569/SRR12264569_2.fastq.gz}

Command output

/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import RangeIndex
[2021-02-24 17:37:27,551]   DEBUG Printing verbose output
[2021-02-24 17:37:27,552]   DEBUG kallisto binary located at /nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/site-packages/kb_python/bins/linux/kallisto/kallisto
[2021-02-24 17:37:27,552]   DEBUG bustools binary located at /nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/site-packages/kb_python/bins/linux/bustools/bustools
[2021-02-24 17:37:27,553]   DEBUG Creating SRS7040866/tmp directory
[2021-02-24 17:37:27,554]   DEBUG Namespace(c1=None, c2=None, cellranger=False, command='count', dry_run=False, fastqs=['ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/068/SRR12264568/SRR12264568_1.fastq.gz', 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/069/SRR12264569/SRR12264569_1.fastq.gz', 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/068/SRR12264568/SRR12264568_2.fastq.gz', 'ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/069/SRR12264569/SRR12264569_2.fastq.gz'], filter='bustools', g='t2g.txt', h5ad=False, i='index.idx', keep_tmp=False, lamanno=False, list=False, loom=False, m='40G', mm=False, no_inspect=False, no_validate=False, nucleus=False, o='SRS7040866', overwrite=False, report=False, t=4, tcc=False, tmp=None, verbose=True, w=None, workflow='standard', x='10XV2')
[2021-02-24 17:37:27,554]    INFO Piping ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/068/SRR12264568/SRR12264568_1.fastq.gz to SRS7040866/tmp/SRR12264568_1.fastq.gz
[2021-02-24 17:37:27,557]    INFO Piping ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/069/SRR12264569/SRR12264569_1.fastq.gz to SRS7040866/tmp/SRR12264569_1.fastq.gz
[2021-02-24 17:37:27,557]    INFO Piping ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/068/SRR12264568/SRR12264568_2.fastq.gz to SRS7040866/tmp/SRR12264568_2.fastq.gz
[2021-02-24 17:37:27,558]    INFO Piping ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/069/SRR12264569/SRR12264569_2.fastq.gz to SRS7040866/tmp/SRR12264569_2.fastq.gz
[2021-02-24 17:37:27,560]    INFO Using index index.idx to generate BUS file to SRS7040866 from
[2021-02-24 17:37:27,560]    INFO         SRS7040866/tmp/SRR12264568_1.fastq.gz
[2021-02-24 17:37:27,560]    INFO         SRS7040866/tmp/SRR12264569_1.fastq.gz
[2021-02-24 17:37:27,560]    INFO         SRS7040866/tmp/SRR12264568_2.fastq.gz
[2021-02-24 17:37:27,560]    INFO         SRS7040866/tmp/SRR12264569_2.fastq.gz
[2021-02-24 17:37:27,560]   DEBUG kallisto bus -i index.idx -o SRS7040866 -x 10XV2 -t 4 SRS7040866/tmp/SRR12264568_1.fastq.gz SRS7040866/tmp/SRR12264569_1.fastq.gz SRS7040866/tmp/SRR12264568_2.fastq.gz SRS7040866/tmp/SRR12264569_2.fastq.gz
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/urllib/request.py", line 280, in urlretrieve
    tfp.write(block)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/urllib/request.py", line 283, in urlretrieve
    reporthook(blocknum, bs, size)
BrokenPipeError: [Errno 32] Broken pipe
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/urllib/request.py", line 280, in urlretrieve
    tfp.write(block)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/nfs/home/mfiruleva/anaconda3/envs/scn_cor/lib/python3.8/urllib/request.py", line 283, in urlretrieve
    reporthook(blocknum, bs, size)
BrokenPipeError: [Errno 32] Broken pipe
[2021-02-24 17:47:24,583]   DEBUG 
[2021-02-24 17:47:24,583]   DEBUG [index] k-mer length: 31
[2021-02-24 17:47:24,583]   DEBUG [index] number of targets: 142,446
[2021-02-24 17:47:24,583]   DEBUG [index] number of k-mers: 120,632,459
[2021-02-24 17:47:24,583]   DEBUG [index] number of equivalence classes: 512,299
[2021-02-24 17:47:24,583]   DEBUG [quant] will process sample 1: SRS7040866/tmp/SRR12264568_1.fastq.gz
[2021-02-24 17:47:24,584]   DEBUG SRS7040866/tmp/SRR12264569_1.fastq.gz
[2021-02-24 17:47:24,584]   DEBUG [quant] will process sample 2: SRS7040866/tmp/SRR12264568_2.fastq.gz
[2021-02-24 17:47:24,584]   DEBUG SRS7040866/tmp/SRR12264569_2.fastq.gz
[2021-02-24 17:47:24,584]   DEBUG [quant] finding pseudoalignments for the reads ... done
[2021-02-24 17:47:24,584]   DEBUG [quant] processed 29,498,682 reads, 11,599,332 reads pseudoaligned
[2021-02-24 17:47:26,907]   DEBUG SRS7040866/output.bus passed validation
[2021-02-24 17:47:26,907]    INFO Sorting BUS file SRS7040866/output.bus to SRS7040866/tmp/output.s.bus
[2021-02-24 17:47:26,907]   DEBUG bustools sort -o SRS7040866/tmp/output.s.bus -T SRS7040866/tmp -t 4 -m 40G SRS7040866/output.bus
[2021-02-24 17:47:58,300]   DEBUG Read in 11599332 BUS records
[2021-02-24 17:47:59,440]   DEBUG SRS7040866/tmp/output.s.bus passed validation
[2021-02-24 17:47:59,440]    INFO Whitelist not provided
[2021-02-24 17:47:59,440]    INFO Copying pre-packaged 10XV2 whitelist to SRS7040866
[2021-02-24 17:47:59,550]    INFO Inspecting BUS file SRS7040866/tmp/output.s.bus
[2021-02-24 17:47:59,550]   DEBUG bustools inspect -o SRS7040866/inspect.json -w SRS7040866/10xv2_whitelist.txt -e SRS7040866/matrix.ec SRS7040866/tmp/output.s.bus
[2021-02-24 17:48:03,253]    INFO Correcting BUS records in SRS7040866/tmp/output.s.bus to SRS7040866/tmp/output.s.c.bus with whitelist SRS7040866/10xv2_whitelist.txt
[2021-02-24 17:48:03,254]   DEBUG bustools correct -o SRS7040866/tmp/output.s.c.bus -w SRS7040866/10xv2_whitelist.txt SRS7040866/tmp/output.s.bus
[2021-02-24 17:48:05,145]   DEBUG Found 737280 barcodes in the whitelist
[2021-02-24 17:48:05,146]   DEBUG Processed 10969696 BUS records
[2021-02-24 17:48:05,146]   DEBUG In whitelist = 5736
[2021-02-24 17:48:05,146]   DEBUG Corrected    = 49327
[2021-02-24 17:48:05,146]   DEBUG Uncorrected  = 10914633
[2021-02-24 17:48:05,169]   DEBUG SRS7040866/tmp/output.s.c.bus passed validation
[2021-02-24 17:48:05,169]    INFO Sorting BUS file SRS7040866/tmp/output.s.c.bus to SRS7040866/output.unfiltered.bus
[2021-02-24 17:48:05,169]   DEBUG bustools sort -o SRS7040866/output.unfiltered.bus -T SRS7040866/tmp -t 4 -m 40G SRS7040866/tmp/output.s.c.bus
[2021-02-24 17:48:32,969]   DEBUG Read in 55063 BUS records
[2021-02-24 17:48:32,992]   DEBUG SRS7040866/output.unfiltered.bus passed validation
[2021-02-24 17:48:32,992]    INFO Generating count matrix SRS7040866/counts_unfiltered/cells_x_genes from BUS file SRS7040866/output.unfiltered.bus
[2021-02-24 17:48:32,992]   DEBUG bustools count -o SRS7040866/counts_unfiltered/cells_x_genes -g t2g.txt -e SRS7040866/matrix.ec -t SRS7040866/transcripts.txt --genecounts SRS7040866/output.unfiltered.bus
[2021-02-24 17:48:34,997]   DEBUG SRS7040866/counts_unfiltered/cells_x_genes.mtx passed validation
[2021-02-24 17:48:34,997]    INFO Filtering with bustools
[2021-02-24 17:48:34,997]    INFO Generating whitelist SRS7040866/filter_barcodes.txt from BUS file SRS7040866/output.unfiltered.bus
[2021-02-24 17:48:34,997]   DEBUG bustools whitelist -o SRS7040866/filter_barcodes.txt SRS7040866/output.unfiltered.bus
[2021-02-24 17:48:35,012]   DEBUG Read in 55053 BUS records, wrote 100 barcodes to whitelist with threshold 63
[2021-02-24 17:48:35,013]    INFO Correcting BUS records in SRS7040866/output.unfiltered.bus to SRS7040866/tmp/output.unfiltered.c.bus with whitelist SRS7040866/filter_barcodes.txt
[2021-02-24 17:48:35,013]   DEBUG bustools correct -o SRS7040866/tmp/output.unfiltered.c.bus -w SRS7040866/filter_barcodes.txt SRS7040866/output.unfiltered.bus
[2021-02-24 17:48:35,869]   DEBUG Found 100 barcodes in the whitelist
[2021-02-24 17:48:35,869]   DEBUG Processed 55053 BUS records
[2021-02-24 17:48:35,870]   DEBUG In whitelist = 14486
[2021-02-24 17:48:35,870]   DEBUG Corrected    = 0
[2021-02-24 17:48:35,870]   DEBUG Uncorrected  = 40567
[2021-02-24 17:48:35,885]   DEBUG SRS7040866/tmp/output.unfiltered.c.bus passed validation
[2021-02-24 17:48:35,885]    INFO Sorting BUS file SRS7040866/tmp/output.unfiltered.c.bus to SRS7040866/output.filtered.bus
[2021-02-24 17:48:35,885]   DEBUG bustools sort -o SRS7040866/output.filtered.bus -T SRS7040866/tmp -t 4 -m 40G SRS7040866/tmp/output.unfiltered.c.bus
[2021-02-24 17:49:04,372]   DEBUG Read in 14486 BUS records
[2021-02-24 17:49:04,384]   DEBUG SRS7040866/output.filtered.bus passed validation
[2021-02-24 17:49:04,384]    INFO Generating count matrix SRS7040866/counts_filtered/cells_x_genes from BUS file SRS7040866/output.filtered.bus
[2021-02-24 17:49:04,385]   DEBUG bustools count -o SRS7040866/counts_filtered/cells_x_genes -g t2g.txt -e SRS7040866/matrix.ec -t SRS7040866/transcripts.txt --genecounts SRS7040866/output.filtered.bus
[2021-02-24 17:49:06,588]   DEBUG SRS7040866/counts_filtered/cells_x_genes.mtx passed validation
[2021-02-24 17:49:06,605]   DEBUG Removing SRS7040866/tmp directory

I also downloaded these files manually and ran kb count with the same parameters. As expected, an output with a higher number of barcodes was generated.

[2021-02-24 18:12:13,375]   DEBUG Found 737280 barcodes in the whitelist
[2021-02-24 18:12:13,376]   DEBUG Processed 19498506 BUS records
[2021-02-24 18:12:13,376]   DEBUG In whitelist = 18935130
[2021-02-24 18:12:13,376]   DEBUG Corrected    = 157841
[2021-02-24 18:12:13,376]   DEBUG Uncorrected  = 405535
Lioscro commented 3 years ago

Hi, @mariafiruleva, You don't have to use the bash syntax, as kallisto is able to concatenate files internally. It does seem from the terminal output that kb is handling this correctly, but would you mind trying the following command anyway?

kb count -i index.idx -g t2g.txt -x 10XV2 -o SRS7040866 --filter bustools -t 4 -m 40G --verbose \
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/068/SRR12264568/SRR12264568_1.fastq.gz \
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/068/SRR12264568/SRR12264568_2.fastq.gz \
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/069/SRR12264569/SRR12264569_1.fastq.gz \
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR122/069/SRR12264569/SRR12264569_2.fastq.gz

If the issue persists, it is most likely a connection (internet) issue.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days