papaemmelab / toil_cnacs

toil wrapper for CNACS
MIT License
9 stars 3 forks source link

generate_pool throws error #5

Open vishramt7 opened 2 years ago

vishramt7 commented 2 years ago

Description

I was trying to generate a pool of normals. I got an error saying "WARNING:toil.leader:The job seems to have left a log file, indicating failure: 'preprocess'"

What I Did

The input command was : toil_cnacs generate_pool Output/jobstore_generate_pool --stats --writeLogs Output/toil_logs --logFile Output/toil_logs.txt --outdir Output --probe_bed /home/pipelines/mutation_detector_nextflow/bedfile/06112021_Leukemia_Panel_sorted.bed --fasta /home/reference_genomes/hg37_chr/hg37_chr/hg37.fa --pool_samp /home/pipelines/NextSeq_mutation_detector_leukemia/Final_Output_controls/NA12878/NA12878.final.bam F --pool_samp /home/pipelines/NextSeq_mutation_detector_leukemia/Final_Output_controls/F1/F1.final.bam F --db_dir db

The above command crashed with:
DEBUG:toil.resource:Module dir is /home/programs/toil_cnacs_env/lib/python2.7/site-packages
DEBUG:toil.resource:Module dir is /home/programs/toil_cnacs_env/lib/python2.7/site-packages
.
.
.
WARNING:toil.leader:The job seems to have left a log file, indicating failure: 'preprocess' y/0/jobc_iWDr
WARNING:toil.leader:y/0/jobc_iWDr    INFO:toil.worker:---TOIL WORKER OUTPUT LOG---
WARNING:toil.leader:y/0/jobc_iWDr    INFO:toil:Running Toil version 3.18.0-84239d802248a5f4a220e762b3b8ce5cc92af0be.
WARNING:toil.leader:y/0/jobc_iWDr    WARNING:toil.resource:'JTRES_0da99608f7f10f94a186c1da0bc5af79' may exist, but is not yet referenced by the worker (KeyError from os.environ[]).
WARNING:toil.leader:y/0/jobc_iWDr    /home/programs/toil_cnacs_env/lib/python2.7/site-packages/toil_cnacs/data/cnacs/subscript_target/preprocess.sh: line 20: BEDTOOLS_PATH: unbound variable

New feature

It is looking for bedtools at the BEDTOOLS_PATH but toil_cnacs has not downloaded bedtools or set the path for it.

Please suggest any solution for the above isssue or point out if I am missing any of the steps.

Thank you, Vishram

juanesarango commented 2 years ago

Hi @vishramt7 sorry for the late response,

We recommend you run it using a container (either docker or singularity), where all dependencies are contained, i.e java, bedtools.

For this you may need to add just 2 extra parameters:

For the volumes, you need to map local volumes inside your container, so the code inside can access the data. For example in your command I see your inputs are in /home. So add --volumes /home /home. And you can add multiple --volumes params.

We also suggest to always use absolute paths, so you don't run into major issues when running containers:

Your command would work better like this:

toil_cnacs \
    generate_pool \
    `pwd`/Output/jobstore_generate_pool \
    --stats \
    --writeLogs `pwd`/Output/toil_logs \
    --logFile `pwd`/Output/toil_logs.txt \
    --outdir `pwd`/Output \
    --probe_bed /home/pipelines/mutation_detector_nextflow/bedfile/06112021_Leukemia_Panel_sorted.bed \
    --fasta /home/reference_genomes/hg37_chr/hg37_chr/hg37.fa \
    --pool_samp /home/pipelines/NextSeq_mutation_detector_leukemia/Final_Output_controls/NA12878/NA12878.final.bam F \
    --pool_samp /home/pipelines/NextSeq_mutation_detector_leukemia/Final_Output_controls/F1/F1.final.bam F \
    --db_dir `pwd`/db \
    --docker papaemmelab/docker-cnacs \
    --volumes /home /home

Please let me know if that works!

juanesarango commented 2 years ago

And by the way, I forgot to mention that when using a container you don't need to pass --db_dir as the container already has the cnacs database inside.

Here you can see more details of what it's installed and stored in the container: papaemmelab/docker-cnacs/blob/master/Dockerfile

vishramt7 commented 2 years ago

Hi @juanesarango , Thanks for the reply! Will try it and get back to you..

vishramt7 commented 2 years ago

Hi @juanesarango , Does the above method of running toil_cnacs require python 2 ? On executing the above command with python3 as default gives me the following error:

INFO:toil:Running Toil version 3.18.0-84239d802248a5f4a220e762b3b8ce5cc92af0be. Traceback (most recent call last): File "/home/miniconda3/bin/toil_cnacs", line 8, in sys.exit(main()) File "/home/miniconda3/lib/python3.8/site-packages/toil_cnacs/cli.py", line 53, in main commands.main(step="generate_pool") File "/home/miniconda3/lib/python3.8/site-packages/toil_cnacs/commands.py", line 337, in main run_toil(toil_options=args, step=step) File "/home/miniconda3/lib/python3.8/site-packages/toil_cnacs/commands.py", line 314, in run_toil ContainerJob.Runner.startToil(head, toil_options) File "/home/miniconda3/lib/python3.8/site-packages/toil/job.py", line 786, in startToil return toil.start(job) File "/home/miniconda3/lib/python3.8/site-packages/toil/common.py", line 784, in start return self._runMainLoop(rootJobGraph) File "/home/miniconda3/lib/python3.8/site-packages/toil/common.py", line 1054, in _runMainLoop return Leader(config=self.config, File "/home/miniconda3/lib/python3.8/site-packages/toil/leader.py", line 124, in init self.toilState = ToilState(jobStore, rootJob, jobCache=jobCache) File "/home/miniconda3/lib/python3.8/site-packages/toil/toilState.py", line 72, in init self._buildToilState(rootJob, jobStore, jobCache) File "/home/miniconda3/lib/python3.8/site-packages/toil/toilState.py", line 103, in _buildToilState self.updatedJobs.add((jobGraph, 0)) TypeError: unhashable type: 'JobGraph'

juanesarango commented 2 years ago

@vishramt7 yes. You need python2 for this version

vishramt7 commented 2 years ago

Hi @juanesarango, I tried executing the command suggested by you in a python 2.7 environment. I am getting an error in the countdp process and the generate_pool step eventually fails. The error is :

SystemCallError: The following error was raised during the container system call: <class 'docker.errors.ContainerError'>: Command '['/home/programs/toilcnacs_env/lib/python2.7/site-packages/toil_cnacs/data/cnacs/subscript_target/count_dup.sh', '/home/programs/toil_cnacs_controls/Output', '/home/pipelines/NextSeq_mutation_detector_leukemia/Final_Output_controls/NA12878/NA12878.final.bam', 'NA12878.final', '/home/programs/toilcnacs_env/lib/python2.7/site-packages/toil_cnacs/data/cnacs', '/home/programs/toilcnacs_env/lib/python2.7/site-packages/toil_cnacs/data/cnacs/lib/utility.sh']' in image 'papaemmelab/docker-cnacs' returned non-zero exit status 255:

Illegal division by zero at /home/programs/toilcnacs_env/lib/python2.7/site-packages/toil_cnacs/data/cnacs/subscript_target/count_dup.pl line 42.

Thank you, Vishram

juanesarango commented 2 years ago

Hi @vishramt7 We have never seen such an issue. Check if all your files are correct in hg19. If you keep having issues, please reach out to Ryunosuke Saiki as he is the main developer.