vgteam / toil-vg

Distributed and cloud computing framework for vg
Apache License 2.0
21 stars 14 forks source link

Gzip and Cat Broken Pipes #518

Closed MohammedJanahi closed 6 years ago

MohammedJanahi commented 6 years ago

When attempting to run vg-toil map or call, gzip and cat produce broken pipe errors.

I ran something like this: toil-vg map ./toiltmp SMPL1 all.xg all.gcsa ./toilout --fastq SMPL1.fastq --config config.yaml --wirkDir toilwork/

And I got:

INFO:toil.lib.bioio:Root logger is at level 'INFO', 'toil' logger at level 'INFO'.
INFO:toil.jobStores.abstractJobStore:The workflow ID is: '86935883-2012-43b0-88fa-2b054f6994e7'
INFO:toil_vg.vg_map:Imported input files into Toil in 243.630013943 seconds
INFO:toil.common:Using the single machine batch system
WARNING:toil.batchSystems.singleMachine:Limiting maxCores to CPU count of system (32).
WARNING:toil.batchSystems.singleMachine:Limiting maxMemory to physically available memory (540680196096).
INFO:toil.common:Created the workflow directory at /gpfs/projects/SDR_ggenome_ymokrab/qatar_public/vcf/101samples/toi
lwork/toil-86935883-2012-43b0-88fa-2b054f6994e7-3650c25e-8412-4b7a-90e5-8bf4175ade04
WARNING:toil.batchSystems.singleMachine:Limiting maxDisk to physically available disk (249090369650688).
INFO:toil.common:No user script to hot-deploy.
INFO:toil.common:Written the environment for the jobs to the environment file
INFO:toil.common:Caching all jobs in job store
INFO:toil.common:0 jobs downloaded.
INFO:toil:Running Toil version 3.13.0-3cb535425306955028ed7604114f772990d00440.
INFO:toil.realtimeLogger:Real-time logging disabled
INFO:toil.toilState:(Re)building internal scheduler state
INFO:toil.leader:Found 1 jobs to start and 0 jobs with successors to run
INFO:toil.leader:Checked batch system has no running jobs and no updated jobs
INFO:toil.leader:Starting the main loop
INFO:toil.leader:Issued job 'run_write_info_to_outstore' p/7/jobuQQWOE with job batch system ID: 0 and cores: 1, disk
: 2.0 G, and memory: 2.0 G
INFO:toil.leader:Job ended successfully: 'run_write_info_to_outstore' p/7/jobuQQWOE
INFO:toil.leader:Issued job 'run_split_fastq' D/w/jobUJHN5B with job batch system ID: 1 and cores: 32, disk: 200.0 G,
 and memory: 4.0 G
INFO:toil.leader:Job ended successfully: 'run_split_fastq' D/w/jobUJHN5B
WARNING:toil.leader:The job seems to have left a log file, indicating failure: 'run_split_fastq' D/w/jobUJHN5B
WARNING:toil.leader:D/w/jobUJHN5B    ---TOIL WORKER OUTPUT LOG---
WARNING:toil.leader:D/w/jobUJHN5B    INFO:toil:Running Toil version 3.13.0-3cb535425306955028ed7604114f772990d00440.
WARNING:toil.leader:D/w/jobUJHN5B    WARNING:toil.resource:'JTRES_7ff92d56012b5b941ea71d789a651f69' may exist, but is
 not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:D/w/jobUJHN5B    WARNING:toil.resource:'JTRES_7ff92d56012b5b941ea71d789a651f69' may exist, but is
 not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:D/w/jobUJHN5B    **bash: pigz: command not found**
WARNING:toil.leader:D/w/jobUJHN5B    split: with FILE=fq_chunk.aa, exit 127 from command: pigz -p 31 > $FILE.fq.gz
WARNING:toil.leader:D/w/jobUJHN5B    **cat: write error: Broken pipe**
WARNING:toil.leader:D/w/jobUJHN5B    Traceback (most recent call last):
WARNING:toil.leader:D/w/jobUJHN5B      File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil/worker
.py", line 316, in main
WARNING:toil.leader:D/w/jobUJHN5B        job._runner(jobGraph=jobGraph, jobStore=jobStore, fileStore=fileStore)
WARNING:toil.leader:D/w/jobUJHN5B      File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil/job.py
", line 1323, in _runner
WARNING:toil.leader:D/w/jobUJHN5B        returnValues = self._run(jobGraph, fileStore)
WARNING:toil.leader:D/w/jobUJHN5B      File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil/job.py
", line 1268, in _run
WARNING:toil.leader:D/w/jobUJHN5B        return self.run(fileStore)
WARNING:toil.leader:D/w/jobUJHN5B      File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil/job.py
", line 1452, in run
WARNING:toil.leader:D/w/jobUJHN5B        rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
WARNING:toil.leader:D/w/jobUJHN5B      File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil_vg/vg_
map.py", line 199, in run_split_fastq
WARNING:toil.leader:D/w/jobUJHN5B        context.runner.call(job, cmd, work_dir = work_dir, tool_name='pigz')
WARNING:toil.leader:D/w/jobUJHN5B      File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil_vg/vg_
common.py", line 185, in call
WARNING:toil.leader:D/w/jobUJHN5B        return self.call_directly(args, work_dir, outfile, errfile, check_output)
WARNING:toil.leader:D/w/jobUJHN5B      File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil_vg/vg_
common.py", line 542, in call_directly
WARNING:toil.leader:D/w/jobUJHN5B        " ".join(args[i]), sts))
WARNING:toil.leader:D/w/jobUJHN5B    Exception: Command cat SRR2098204_R1.fastq returned with non-zero exit status 1
WARNING:toil.leader:D/w/jobUJHN5B    ERROR:toil.worker:Exiting the worker because of a failed job on host hpcgenelite
14.research.sidra.local
WARNING:toil.leader:D/w/jobUJHN5B    WARNING:toil.jobGraph:Due to failure we are reducing the remaining retry count o
f job 'run_split_fastq' D/w/jobUJHN5B with ID D/w/jobUJHN5B to 1
INFO:toil.leader:Issued job 'run_split_fastq' D/w/jobUJHN5B with job batch system ID: 2 and cores: 32, disk: 200.0 G,
 and memory: 4.0 G
INFO:toil.leader:Job ended successfully: 'run_split_fastq' D/w/jobUJHN5B
WARNING:toil.leader:The job seems to have left a log file, indicating failure: 'run_split_fastq' D/w/jobUJHN5B
WARNING:toil.leader:D/w/jobUJHN5B    ---TOIL WORKER OUTPUT LOG---
WARNING:toil.leader:D/w/jobUJHN5B    INFO:toil:Running Toil version 3.13.0-3cb535425306955028ed7604114f772990d00440.
WARNING:toil.leader:D/w/jobUJHN5B    WARNING:toil.resource:'JTRES_7ff92d56012b5b941ea71d789a651f69' may exist, but is
 not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:D/w/jobUJHN5B    WARNING:toil.resource:'JTRES_7ff92d56012b5b941ea71d789a651f69' may exist, but is
 not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:D/w/jobUJHN5B    bash: pigz: command not found
WARNING:toil.leader:D/w/jobUJHN5B    split: with FILE=fq_chunk.aa, exit 127 from command: pigz -p 31 > $FILE.fq.gz
WARNING:toil.leader:D/w/jobUJHN5B    cat: write error: Broken pipe
WARNING:toil.leader:D/w/jobUJHN5B    Traceback (most recent call last):
WARNING:toil.leader:D/w/jobUJHN5B      File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil/worker
.py", line 316, in main
WARNING:toil.leader:D/w/jobUJHN5B        job._runner(jobGraph=jobGraph, jobStore=jobStore, fileStore=fileStore)
WARNING:toil.leader:D/w/jobUJHN5B      File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil/job.py
", line 1323, in _runner
WARNING:toil.leader:D/w/jobUJHN5B        returnValues = self._run(jobGraph, fileStore)
WARNING:toil.leader:D/w/jobUJHN5B      File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil/job.py
", line 1268, in _run
WARNING:toil.leader:D/w/jobUJHN5B        return self.run(fileStore)
WARNING:toil.leader:D/w/jobUJHN5B      File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil/job.py
", line 1452, in run
WARNING:toil.leader:D/w/jobUJHN5B        rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
WARNING:toil.leader:D/w/jobUJHN5B      File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil_vg/vg_
map.py", line 199, in run_split_fastq
WARNING:toil.leader:D/w/jobUJHN5B        context.runner.call(job, cmd, work_dir = work_dir, tool_name='pigz')
WARNING:toil.leader:D/w/jobUJHN5B      File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil_vg/vg_
common.py", line 185, in call
WARNING:toil.leader:D/w/jobUJHN5B        return self.call_directly(args, work_dir, outfile, errfile, check_output)
WARNING:toil.leader:D/w/jobUJHN5B      File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil_vg/vg_
common.py", line 542, in call_directly
WARNING:toil.leader:D/w/jobUJHN5B        " ".join(args[i]), sts))
WARNING:toil.leader:D/w/jobUJHN5B    Exception: Command cat SRR2098204_R1.fastq returned with non-zero exit status 1
WARNING:toil.leader:D/w/jobUJHN5B    ERROR:toil.worker:Exiting the worker because of a failed job on host hpcgenelite
14.research.sidra.local
WARNING:toil.leader:D/w/jobUJHN5B    WARNING:toil.jobGraph:Due to failure we are reducing the remaining retry count o
f job 'run_split_fastq' D/w/jobUJHN5B with ID D/w/jobUJHN5B to 0
WARNING:toil.leader:Job 'run_split_fastq' D/w/jobUJHN5B with ID D/w/jobUJHN5B is completely failed
INFO:toil.leader:No jobs left to run so exiting.
INFO:toil.leader:Finished the main loop
INFO:toil.serviceManager:Waiting for service manager thread to finish ...
INFO:toil.serviceManager:... finished shutting down the service manager. Took 0.876837015152 seconds
INFO:toil.statsAndLogging:Waiting for stats and logging collator thread to finish ...
INFO:toil.statsAndLogging:... finished collating stats and logs. Took 0.0829539299011 seconds
INFO:toil.leader:Finished toil run with 2 failed jobs
INFO:toil.leader:Failed jobs at end of the run: 'run_split_fastq' D/w/jobUJHN5B 'run_split_reads' p/7/jobuQQWOE
Traceback (most recent call last):
  File "/gpfs/software/genomics/toilvenv/bin/toil-vg", line 11, in <module>
    sys.exit(main())
  File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil_vg/vg_toil.py", line 363, in main
    map_main(context, args)
  File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil_vg/vg_map.py", line 661, in map_main
    toil.start(init_job)
  File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil/common.py", line 743, in start
    return self._runMainLoop(rootJobGraph)
  File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil/common.py", line 1018, in _runMainLoop
    jobCache=self._jobCache).run()
  File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil/leader.py", line 226, in run
    raise FailedJobsException(self.config.jobStore, self.toilState.totalFailedJobs, self.jobStore)
toil.leader.FailedJobsException: The job store 'file:/gpfs/projects/SDR_ggenome_ymokrab/qatar_public/vcf/101samples/t
oiltmp' contains 2 failed jobs: 'run_split_fastq' D/w/jobUJHN5B, 'run_split_reads' p/7/jobuQQWOE
=========> Failed job 'run_split_fastq' D/w/jobUJHN5B
---TOIL WORKER OUTPUT LOG---
INFO:toil:Running Toil version 3.13.0-3cb535425306955028ed7604114f772990d00440.
WARNING:toil.resource:'JTRES_7ff92d56012b5b941ea71d789a651f69' may exist, but is not yet referenced by the worker (Ke
yError from os.environ[]).
WARNING:toil.resource:'JTRES_7ff92d56012b5b941ea71d789a651f69' may exist, but is not yet referenced by the worker (Ke
yError from os.environ[]).
bash: pigz: command not found
split: with FILE=fq_chunk.aa, exit 127 from command: pigz -p 31 > $FILE.fq.gz
**cat: write error: Broken pipe**
Traceback (most recent call last):
  File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil/worker.py", line 316, in main
    job._runner(jobGraph=jobGraph, jobStore=jobStore, fileStore=fileStore)
  File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil/job.py", line 1323, in _runner
    returnValues = self._run(jobGraph, fileStore)
  File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil/job.py", line 1268, in _run
    return self.run(fileStore)
  File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil/job.py", line 1452, in run
    rValue = userFunction(*((self,) + tuple(self._args)), **self._kwargs)
  File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil_vg/vg_map.py", line 199, in run_split_fastq
    context.runner.call(job, cmd, work_dir = work_dir, tool_name='pigz')
  File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil_vg/vg_common.py", line 185, in call
    return self.call_directly(args, work_dir, outfile, errfile, check_output)
  File "/gpfs/software/genomics/toilvenv/lib/python2.7/site-packages/toil_vg/vg_common.py", line 542, in call_directl
y
    " ".join(args[i]), sts))
Exception: Command cat SRR2098204_R1.fastq returned with non-zero exit status 1
ERROR:toil.worker:Exiting the worker because of a failed job on host hpcgenelite14.research.sidra.local
WARNING:toil.jobGraph:Due to failure we are reducing the remaining retry count of job 'run_split_fastq' D/w/jobUJHN5B
 with ID D/w/jobUJHN5B to 0
<=========

When I initially ran the same command with a compressed fastq file, gzip gave me the same broken pipe error.

adamnovak commented 6 years ago

It looks like the real problem is this line:

WARNING:toil.leader:D/w/jobUJHN5B    **bash: pigz: command not found**

You don't have pigz installed, which toil-vg needs to use. You can either install it, or, if you have access to Docker, add --container Docker to the command so that it runs all of its work in Docker containers (which lets it pull down its dependencies automatically).

adamnovak commented 6 years ago

I'm going to close this; please open a new issue if you run into more trouble, or reopen this one if installing pigz or using Docker doesn't solve this problem.