Closed CEPHAS-01 closed 6 months ago
Hi,
-e local
is currently for running internally at NCBI.
Please try the same command with -e docker
or -e singularity
whichever applies to you.
where do I download the singularity image file?
when you use -e singularity
, the workflow automatically pulls the docker image and converts it to a singularity image.
Hi @pstrope Thanks so much for your feedback. I have updated the command as you requested and it is running so far. I will provide you updates later.
TLag
Slurm terminated the job apparently due to an issue with an HMM process:
SLURM: slurmstepd: error: JOB 4728352 STEPD TERMINATED ON n049 AT 2024-04-10T08:57:39 DUE TO JOB NOT ENDING WITH SIGNALS
Log from Std out:
Wed Apr 10 07:57:29 PDT 2024 N E X T F L O W ~ version 23.10.1 Launching
Tools/software/egapx/ui/../nf/ui.nf
[trusting_cori] DSL2 - revision: c134f40af5 in egapx block [- ] process > egapx:setup_genome:get_geno... - [- ] process > egapx:setup_proteins:conver... - [- ] process > egapx:miniprot:run_miniprot - [- ] process > egapx:paf2asn:run_paf2asn - [- ] process > egapx:best_aligned_prot:run... - [- ] process > egapx:align_filter_sa:run_a... - [- ] process > egapx:run_align_sort -[- ] process > egapx:setup_genome:get_geno... - [- ] process > egapx:setup_proteins:conver... - [- ] process > egapx:miniprot:run_miniprot - [- ] process > egapx:paf2asn:run_paf2asn - [- ] process > egapx:best_aligned_prot:run... - [- ] process > egapx:align_filter_sa:run_a... - [- ] process > egapx:run_align_sort - [- ] process > egapx:star_index:build_index - [- ] process > egapx:star_simplified:exec - [- ] process > egapx:bam_strandedness:exec - [- ] process > egapx:bam_strandedness:merge - [- ] process > egapx:bam_bin_and_sort:calc... - [- ] process > egapx:bam_bin_and_sort:bam_bin - [- ] process > egapx:bam_bin_and_sort:merg... - [- ] process > egapx:bam_bin_and_sort:merge - [- ] process > egapx:bam2asn:convert - [- ] process > egapx:rnaseq_collapse:gener... - [- ] process > egapx:rnaseq_collapse:run_r... - [- ] process > egapx:rnaseq_collapse:run_g... - [- ] process > egapx:get_hmm_params:run_ge... - [- ] process > egapx:chainer:run_align_sort - [- ] process > egapx:chainer:generate_jobs - [- ] process > egapx:chainer:run_chainer - [- ] process > egapx:chainer:run_gpxmake... - [- ] process > egapx:gnomon_wnode:gpx_qsubmit - [- ] process > egapx:gnomon_wnode:annot - [- ] process > egapx:gnomon_wnode:gpx_qdump - [- ] process > egapx:annot_builder:annot_b... - [- ] process > egapx:annot_builder:annot_b... -
[- ] process > egapx:setup_genome:get_geno... - [- ] process > egapx:setup_proteins:conver... - [- ] process > egapx:miniprot:run_miniprot - [- ] process > egapx:paf2asn:run_paf2asn - [- ] process > egapx:best_aligned_prot:run... - [- ] process > egapx:align_filter_sa:run_a... - [- ] process > egapx:run_align_sort - [- ] process > egapx:star_index:build_index - [- ] process > egapx:star_simplified:exec - [- ] process > egapx:bam_strandedness:exec - [- ] process > egapx:bam_strandedness:merge - [- ] process > egapx:bam_bin_and_sort:calc... - [- ] process > egapx:bam_bin_and_sort:bam_bin - [- ] process > egapx:bam_bin_and_sort:merg... - [- ] process > egapx:bam_bin_and_sort:merge - [- ] process > egapx:bam2asn:convert - [- ] process > egapx:rnaseq_collapse:gener... - [- ] process > egapx:rnaseq_collapse:run_r... - [- ] process > egapx:rnaseq_collapse:run_g... - [- ] process > egapx:get_hmm_params:run_ge... - [- ] process > egapx:chainer:run_align_sort - [- ] process > egapx:chainer:generate_jobs - [- ] process > egapx:chainer:run_chainer - [- ] process > egapx:chainer:run_gpxmake... - [- ] process > egapx:gnomon_wnode:gpx_qsubmit - [- ] process > egapx:gnomon_wnode:annot - [- ] process > egapx:gnomon_wnode:gpx_qdump - [- ] process > egapx:annot_builder:annot_b... - [- ] process > egapx:annot_builder:annot_b... - [- ] process > egapx:annot_builder:annot_b... - [- ] process > egapx:annotwriter:run_annot... - [- ] process > export - Pulling Singularity image docker://ncbi/egapx:latest [cache egapAnnotation/testrun/singularity/ncbi-egapx-latest.img] WARN: Singularity cache directory has not been defined -- Remote image will be stored in the path: egapAnnotation/testrun/singularity -- Use the environment variable NXF_SINGULARITY_CACHEDIR to specify a different location
executor > local (4) [a4/ab36b3] process > egapx:setup_genome:get_geno... [ 0%] 0 of 1 [ff/6202e7] process > egapx:setup_proteins:conver... [ 0%] 0 of 1 [- ] process > egapx:miniprot:run_miniprot - [- ] process > egapx:paf2asn:run_paf2asn - [- ] process > egapx:best_aligned_prot:run... - [- ] process > egapx:align_filter_sa:run_a... - [- ] process > egapx:run_align_sort - [- ] process > egapx:star_index:build_index - [- ] process > egapx:star_simplified:exec - [- ] process > egapx:bam_strandedness:exec - [- ] process > egapx:bam_strandedness:merge - [- ] process > egapx:bam_bin_and_sort:calc... - [- ] process > egapx:bam_bin_and_sort:bam_bin - [- ] process > egapx:bam_bin_and_sort:merg... - [- ] process > egapx:bam_bin_and_sort:merge - [- ] process > egapx:bam2asn:convert - [- ] process > egapx:rnaseq_collapse:gener... - [- ] process > egapx:rnaseq_collapse:run_r... - [- ] process > egapx:rnaseq_collapse:run_g... - [84/f10b05] process > egapx:get_hmm_params:run_ge... [ 0%] 0 of 1 [- ] process > egapx:chainer:run_align_sort - [- ] process > egapx:chainer:generate_jobs - [- ] process > egapx:chainer:run_chainer - [- ] process > egapx:chainer:run_gpxmake... - [- ] process > egapx:gnomon_wnode:gpx_qsubmit - [- ] process > egapx:gnomon_wnode:annot - [- ] process > egapx:gnomon_wnode:gpx_qdump - [92/659f32] process > egapx:annot_builder:annot_b... [ 0%] 0 of 1 [- ] process > egapx:annot_builder:annot_b... - [- ] process > egapx:annot_builder:annot_b... - [- ] process > egapx:annotwriter:run_annot... - [- ] process > export -
executor > local (4) [a4/ab36b3] process > egapx:setup_genome:get_geno... [ 0%] 0 of 1 [ff/6202e7] process > egapx:setup_proteins:conver... [ 0%] 0 of 1 [- ] process > egapx:miniprot:run_miniprot - [- ] process > egapx:paf2asn:run_paf2asn - [- ] process > egapx:best_aligned_prot:run... - [- ] process > egapx:align_filter_sa:run_a... - [- ] process > egapx:run_align_sort - [- ] process > egapx:star_index:build_index - [- ] process > egapx:star_simplified:exec - [- ] process > egapx:bam_strandedness:exec - [- ] process > egapx:bam_strandedness:merge - [- ] process > egapx:bam_bin_and_sort:calc... - [- ] process > egapx:bam_bin_and_sort:bam_bin - [- ] process > egapx:bam_bin_and_sort:merg... - [- ] process > egapx:bam_bin_and_sort:merge - [- ] process > egapx:bam2asn:convert - [- ] process > egapx:rnaseq_collapse:gener... - [- ] process > egapx:rnaseq_collapse:run_r... - [- ] process > egapx:rnaseq_collapse:run_g... - [84/f10b05] process > egapx:get_hmm_params:run_ge... [ 0%] 0 of 1 [- ] process > egapx:chainer:run_align_sort - [- ] process > egapx:chainer:generate_jobs - [- ] process > egapx:chainer:run_chainer - [- ] process > egapx:chainer:run_gpxmake... - [- ] process > egapx:gnomon_wnode:gpx_qsubmit - [- ] process > egapx:gnomon_wnode:annot - [- ] process > egapx:gnomon_wnode:gpx_qdump - [92/659f32] process > egapx:annot_builder:annot_b... [ 0%] 0 of 1 [- ] process > egapx:annot_builder:annot_b... - [- ] process > egapx:annot_builder:annot_b... - [- ] process > egapx:annotwriter:run_annot... - [- ] process > export - ERROR ~ Error executing process > 'egapx:get_hmm_params:run_get_hmm'
Caused by: Process
egapx:get_hmm_params:run_get_hmm
terminated with an error exit status (255)Command executed:
!/usr/bin/env python3
import json from urllib.request import urlopen def get_closest_hmm(taxid): taxon_str = str(taxid) if not taxon_str: return "" dataset_taxonomy_url = "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/"
taxids_file = urlopen("https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/EGAP/gnomon/hmm_parameters/taxid.list") taxids_list = [] lineages = [] for line in taxids_file: parts = line.decode("utf-8").strip().split(' ') if len(parts) > 0: t = parts[0] taxids_list.append(t) if len(parts) > 1: l = map(lambda x: int(x) if x[-1] != ';' else int(x[:-1]), parts[1].split()) lineages.append((int(t), list(l)+[int(t)])) if len(lineages) < len(taxids_list): taxonomy_json_file = urlopen(dataset_taxonomy_url+','.join(taxids_list)) taxonomy = json.load(taxonomy_json_file)["taxonomy_nodes"] lineages = [ (t["taxonomy"]["tax_id"], t["taxonomy"]["lineage"] + [t["taxonomy"]["tax_id"]]) for t in taxonomy ] taxon_json_file = urlopen(dataset_taxonomy_url+taxon_str) taxon = json.load(taxon_json_file)["taxonomy_nodes"][0] lineage = taxon["taxonomy"]["lineage"] lineage.append(taxon["taxonomy"]["tax_id"]) # print(lineage) # print(taxon["taxonomy"]["organism_name"]) best_lineage = None best_taxid = None best_score = 0 for (t, l) in lineages: pos1 = 0 last_match = 0 for pos in range(len(lineage)): tax_id = lineage[pos] while tax_id != l[pos1]: if pos1 + 1 < len(l): pos1 += 1 else: break if tax_id == l[pos1]: last_match = pos1 else: break if last_match > best_score: best_score = last_match best_taxid = t best_lineage = l if best_score == 0: return "" # print(best_lineage) # print(best_taxid, best_score) return f'https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/EGAP/gnomon/hmm_parameters/{best_taxid}.params'
print(get_closest_hmm(6954))
Command exit status: 255
Command output: (empty)
Command error: INFO: Converting SIF file to temporary sandbox... FATAL: stat /bin/bash: no such file or directory INFO: Cleaning up image...
Work dir: egapAnnotation/testrun/84/f10b0550595d959ae71d9138ed2599
Tip: view the complete command output by changing to the process work dir and entering the command
cat .command.out
-- Check 'egapAnnotation/testrun/testRun_out/nextflow.log' file for details WARN: Killing running tasks (3)
executor > local (4) [- ] process > egapx:setup_genome:get_geno... - [- ] process > egapx:setup_proteins:conver... - [- ] process > egapx:miniprot:run_miniprot - [- ] process > egapx:paf2asn:run_paf2asn - [- ] process > egapx:best_aligned_prot:run... - [- ] process > egapx:align_filter_sa:run_a... - [- ] process > egapx:run_align_sort - [- ] process > egapx:star_index:build_index - [- ] process > egapx:star_simplified:exec - [- ] process > egapx:bam_strandedness:exec - [- ] process > egapx:bam_strandedness:merge - [- ] process > egapx:bam_bin_and_sort:calc... - [- ] process > egapx:bam_bin_and_sort:bam_bin - [- ] process > egapx:bam_bin_and_sort:merg... - [- ] process > egapx:bam_bin_and_sort:merge - [- ] process > egapx:bam2asn:convert - [- ] process > egapx:rnaseq_collapse:gener... - [- ] process > egapx:rnaseq_collapse:run_r... - [- ] process > egapx:rnaseq_collapse:run_g... - [84/f10b05] process > egapx:get_hmm_params:run_ge... [100%] 1 of 1, failed: 1 ✘ [- ] process > egapx:chainer:run_align_sort - [- ] process > egapx:chainer:generate_jobs - [- ] process > egapx:chainer:run_chainer - [- ] process > egapx:chainer:run_gpxmake... - [- ] process > egapx:gnomon_wnode:gpx_qsubmit - [- ] process > egapx:gnomon_wnode:annot - [- ] process > egapx:gnomon_wnode:gpx_qdump - [- ] process > egapx:annot_builder:annot_b... - [- ] process > egapx:annot_builder:annot_b... - [- ] process > egapx:annot_builder:annot_b... - [- ] process > egapx:annotwriter:run_annot... - [- ] process > export - ERROR ~ Error executing process > 'egapx:get_hmm_params:run_get_hmm'
Caused by: Process
egapx:get_hmm_params:run_get_hmm
terminated with an error exit status (255)Command executed:
!/usr/bin/env python3
import json from urllib.request import urlopen def get_closest_hmm(taxid): taxon_str = str(taxid) if not taxon_str: return "" dataset_taxonomy_url = "https://api.ncbi.nlm.nih.gov/datasets/v2alpha/taxonomy/taxon/"
taxids_file = urlopen("https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/EGAP/gnomon/hmm_parameters/taxid.list") taxids_list = [] lineages = [] for line in taxids_file: parts = line.decode("utf-8").strip().split(' ') if len(parts) > 0: t = parts[0] taxids_list.append(t) if len(parts) > 1: l = map(lambda x: int(x) if x[-1] != ';' else int(x[:-1]), parts[1].split()) lineages.append((int(t), list(l)+[int(t)])) if len(lineages) < len(taxids_list): taxonomy_json_file = urlopen(dataset_taxonomy_url+','.join(taxids_list)) taxonomy = json.load(taxonomy_json_file)["taxonomy_nodes"] lineages = [ (t["taxonomy"]["tax_id"], t["taxonomy"]["lineage"] + [t["taxonomy"]["tax_id"]]) for t in taxonomy ] taxon_json_file = urlopen(dataset_taxonomy_url+taxon_str) taxon = json.load(taxon_json_file)["taxonomy_nodes"][0] lineage = taxon["taxonomy"]["lineage"] lineage.append(taxon["taxonomy"]["tax_id"]) # print(lineage) # print(taxon["taxonomy"]["organism_name"]) best_lineage = None best_taxid = None best_score = 0 for (t, l) in lineages: pos1 = 0 last_match = 0 for pos in range(len(lineage)): tax_id = lineage[pos] while tax_id != l[pos1]: if pos1 + 1 < len(l): pos1 += 1 else: break if tax_id == l[pos1]: last_match = pos1 else: break if last_match > best_score: best_score = last_match best_taxid = t best_lineage = l if best_score == 0: return "" # print(best_lineage) # print(best_taxid, best_score) return f'https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/EGAP/gnomon/hmm_parameters/{best_taxid}.params'
print(get_closest_hmm(6954))
Command exit status: 255
Command output: (empty)
Command error: INFO: Converting SIF file to temporary sandbox... FATAL: stat /bin/bash: no such file or directory INFO: Cleaning up image...
Work dir: egapAnnotation/testrun/84/f10b0550595d959ae71d9138ed2599
Tip: view the complete command output by changing to the process work dir and entering the command
cat .command.out
-- Check 'egapAnnotation/testrun/testRun_out/nextflow.log' file for details
!!WARNING!! This is an alpha release with limited features and organism scope to collect initial feedback on execution. Outputs are not yet complete and not intended for production use.
None To resume execution, run: nextflow -C egapAnnotation/testrun/egapx_config/singularity.config,Tools/software/egapx/ui/assets/config/default.config,Tools/software/egapx/ui/assets/config/docker_image.config,Tools/software/egapx/ui/assets/config/process_resources.config -log egapAnnotation/testrun/testRun_out/nextflow.log run Tools/software/egapx/ui/../nf/ui.nf --output egapAnnotation/testrun/testRun_out -with-report egapAnnotation/testrun/testRun_out/run.report.html -with-timeline egapAnnotation/testrun/testRun_out/run.timeline.html -with-trace egapAnnotation/testrun/testRun_out/run.trace.txt -params-file egapAnnotation/testrun/testRun_out/run_params.yaml -resume Don't forget to delete file(s) /tmp/tmp7_2je1cx Wed Apr 10 08:56:39 PDT 2024
Please post the full command that you ran. Thanks.
python3 Tools/software/egapx/ui/egapx.py Tools/software/egapx/input_ramb_2.yml -e singularity -w egapAnnotation -o ramb2_out
Thanks. Did you try with the included example YAML file (input_D_farinae_small.yaml
), and did it run to completion?
Yes, the run with the included input_D_farinae_small.yaml file gave me the error actually. I also tested on my dataset, the command on the dataset is what I posted, but I used the same command but the appropriate "input_D_farinae_small.yaml" for the run with the test data.
The command for the test run:
python3 Tools/software/egapx/ui/egapx.py Tools/software/egapx/examples/input_D_farinae_small.yaml -e singularity -w egapAnnotation/testrun -o testRun_out
The only files in the testRun_out directory run_params.yaml run.trace.txt run.report.html run.timeline.html nextflow.log
The test run with the input_D_farinae_small.yaml is what produced the error I posted earlier.
OK, we will look into it and get back to you. Thank you for testing and reporting.
Pooja
Okay thanks.
FWIW I don't know if the complete nextflow log would be useful but it's quite long to be posted here 721 lines. Is this something you would like to look at?
Yes, you could attach the file here. That would be helpful.
Alright, here it is. nextflow.log
The stand-out error from that is
FATAL: stat /bin/bash: no such file or directory
is that an actual quirk of the machine you are running on? Is it there, but its an NFS-type mount and its flaky sometimes?
Yes it is an NFS over an HPC system. From my understanding of the line, the .sif image is what appears to be "missing". Something I could try is to try this on my macbook and see if it runs through. What is the estimated size of the .sif image file?
I can confirm that the image was successfully pulled, about 500MB.
Hi, it looks like opening the container might be an issue. Let's see if that is truly the case. Can you run the following and see if it works (ie, version is printed)
singularity exec path_to_downloaded_image/singularity/ncbi-egapx-latest.img getfasta -version
The container opened with the output : getfasta: 0.0.1670
Hi Temitayo -- I just wanted to give you an update that we're working on some additional testing here on a different HPC we have access to to see if we can reproduce your issue. One other notable issue showing up in the log is:
Process
egapx:get_hmm_params:run_get_hmm
terminated with an error exit status (255)
We're seeing something along those lines in some other testing that looks like an issue with web API access from within Singularity, and investigating how we can solve it. We may also need to set up a Zoom to help pick apart what's going on here, but there's enough similarity to one issue that we can reproduce that we'll try to resolve that first and see if it helps you.
Stay tuned!
Hi @murphyte thanks so much for your feedback. I have also run this on another HPC and got another type of error, but I want to look into this properly first, and perhaps run on a third HPC that I have access to before reporting it here. I am okay with a Zoom call to help pick the issues apart, just let me know when you are up for it. Looking forward to seeing this resolved.
Hi, I'm a developer on EGAPx team. What HPC clusters do you use specifically and do they have publicly available documentation? I'd like to look at them before Zoom call. We had recently adapted EGAPx to Biowulf, and there were some incompatibilities to solve.
Victor.
Hi Victor,
The cluster runs on Linux version: 3.10.0-1160.83.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) Slurm for job management Openmpi for message parsing
please share your email address and I will send you the URL to the available institution documentation.
Thanks in advance.
Temitayo
That was an actual bug which is fixed in the next upcoming release, 0.1.1 alpha.
Hi,
Happy to finally see this tool released, thanks.
I initiated the test run on local configuration but encountered the error below with the protein conversion step
python3 Tools/software/egapx/ui/egapx.py Tools/software/egapx/examples/input_D_farinae_small.yaml -e local -w egapAnnotation/testrun -o testRunOutput
Please how do I fix this?
Also, if I want to use the singularity mode, where do I download the singularity image file?
Thanks in advance.
TLag