ncbi / egapx

Eukaryotic Genome Annotation Pipeline-External caller scripts and documentation
Other
94 stars 9 forks source link

No Internet access for work nodes. Was: Error: (106.16) Application's execution failed (CException::eUnknown) failed to create temporary path #44

Closed CEPHAS-01 closed 2 weeks ago

CEPHAS-01 commented 1 month ago

Hi, I encountered this problem while running the EGAPx pipeline with Singularity. Let me provide as much information as possible. First, my singularity.config file is as follows:

singularity { enabled = true autoMounts = true singularity.cacheDir = 'egap/tmpDir' }

Run command:

python3 ui/egapx.py bighorn_reads.yml -e singularity -w egap/tmpDir/workDir -o annotation_out

Output and error:

executor > local (20) [36/f52493] process > egapx:setup_genome:get_genome_info [100%] 1 of 1 ✔ [41/0e2552] process > egapx:setup_proteins:convert_proteins [100%] 1 of 1 ✔ [78/07dbb1] process > egapx:target_proteins_plane:miniprot:split_proteins [100%] 1 of 1 ✔ [f0/367797] process > egapx:target_proteins_plane:miniprot:run_miniprot (1) [100%] 4 of 4 ✔ [70/b7c4ba] process > egapx:target_proteins_plane:paf2asn:run_paf2asn (4) [100%] 4 of 4 ✔ [f3/fec9e3] process > egapx:target_proteins_plane:best_aligned_prot:run_best_aligned_prot [100%] 1 of 1 ✔ [c3/91046d] process > egapx:target_proteins_plane:align_filter_sa:run_align_filter_sa [100%] 1 of 1 ✔ [92/d2e396] process > egapx:target_proteins_plane:run_align_sort [100%] 4 of 4, failed: 4.. [91/ab3824] process > egapx:rnaseq_short_plane:star_index:build_index [100%] 1 of 1 ✔ [- ] process > egapx:rnaseq_short_plane:star:run_star (1) - [- ] process > egapx:rnaseq_short_plane:bam_strandedness:rnaseq_divide_by_strandedness - [- ] process > egapx:rnaseq_short_plane:bam_bin_and_sort:calc_assembly_sizes - [- ] process > egapx:rnaseq_short_plane:bam_bin_and_sort:bam_bin - [- ] process > egapx:rnaseq_short_plane:bam_bin_and_sort:merge_prepare - [- ] process > egapx:rnaseq_short_plane:bam_bin_and_sort:merge - [- ] process > egapx:rnaseq_short_plane:bam2asn:convert - ... ... ...

[- ] process > egapx:gnomon_plane:gnomon_wnode:gpx_qsubmit - [- ] process > egapx:gnomon_plane:gnomon_wnode:annot - [- ] process > egapx:gnomon_plane:gnomon_wnode:gpx_qdump - [31/2aaf2a] process > egapx:annot_builder:annot_builder_main [100%] 1 of 1 ✔ [- ] process > egapx:annot_builder:annot_builder_input - [- ] process > egapx:annot_builder:annot_builder_run - [- ] process > egapx:annotwriter:run_annotwriter - [- ] process > export - [f9/d12672] NOTE: Process egapx:target_proteins_plane:run_align_sort terminated with an error exit status (3) -- Execution is retried (3) ERROR ~ Error executing process > 'egapx:target_proteins_plane:run_align_sort'

Caused by: Process egapx:target_proteins_plane:run_align_sort terminated with an error exit status (3)

Command executed:

mkdir -p output mkdir -p LDS_Index lds2_indexer -source LDS_Index echo "align.asn" > alignments.mft align_sort -k subject,subject_start,-subject_end,subject_strand,query,query_start,-query_end,query_strand,-num_ident,gap_count -input-manifest alignments.mft -o output/sorted_aligns.asn -lds2 LDS_Index/lds2.db

Command exit status: 3

Command output: (empty)

Command error: INFO: Environment variable SINGULARITY_TMPDIR is set, but APPTAINER_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred Error: (CException::eUnknown) failed to create temporary path Error: (106.16) Application's execution failed (CException::eUnknown) failed to create temporary path

Work dir: egap/tmpDir/workDir/92/d2e396449993bd2d15ab600858e3ae

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check 'annotation_out/nextflow.log' file for details

Some extra information I am using apptainer 1.3.3 and nextflow 23.10.1 I am using the latest version of EGAPx with a patch from here I defined this directory "egap/tmpDir" as the TMPDIR and other directories for the output and I am sure that this directory is recursively writable. I have also tried the same run on another HPC and the pipeline terminates with an error on the same process. process > 'egapx:target_proteins_plane:run_align_sort'.

Kindly help to resolve this error.

Thank you.

Temitayo

victzh commented 3 weeks ago

Thank you Temitayo, I checked this report and have a suspicious code responsible. As we can't reproduce this bug directly I will fix this suspicious code in upcoming release 0.3 and ask you to test it then. This is similar to issue #39 but this time it is not just mktemp, which obeys TMPDIR and can be fixed by setting it.

CEPHAS-01 commented 3 weeks ago

Alright @victzh , thank you for your feedback. I will be on the lookout for the release update.

victzh commented 3 weeks ago

Please try our new release, 0.3.0-alpha which was completed today. I'd appreciate your feedback whether it solved the issue or not.

CEPHAS-01 commented 3 weeks ago

Hi @victzh Thanks so much for this prompt update!

I have tried the new release out and encountered the following:

  1. On the example data

python3 ui/egapx.py ./examples/input_D_farinae_small_readlist.yaml -e singularity -w tmpDir/workDir -o small_farinae

!!WARNING!! This is an alpha release with limited features and organism scope to collect initial feedback on execution. Outputs are not yet complete and not intended for production use.

Could not find path for ortho taxid 7227 (take note that the taxid in the example file was 6954)

The example/input_D_farinae_small_readlist.yaml file contained:

genome: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/020/809/275/GCA_020809275.1_ASM2080927v1/GCA_020809275.1_ASM2080927v1_genomic.fna.gz taxid: 6954 reads: /egapx2/egapx/examples/input_D_farinae_small_reads.txt annotation_provider: GenBank submitter annotation_name_prefix: GCA_020809275.1 locus_tag_prefix: egapxtmp

  1. On my data !!WARNING!! This is an alpha release with limited features and organism scope to collect initial feedback on execution. Outputs are not yet complete and not intended for production use.

Could not find path for ortho taxid 9606 (take note that the taxid I specified was 37174)

The yml file I used for this is: genome: Bighorn.fasta taxid: 37174 reads:

For both instances, I kept the singularity.config file simple as follows: singularity { enabled = true autoMounts = true }

The main observations were a. It took over 15 minutes for that output to be written and there was no sign that nextflow was launched at all. To confirm that my applications were correctly loaded, I ran the previous release and within 20 seconds had nextflow launched. b. it seems that taxids are being sourced from somewhere different from those specified in the yml file. c. There was no log file produced to get an idea of what may have gone wrong. Only the "work_dir_singularity.last" file containing the path to the work directory was produced.

pstrope commented 3 weeks ago

Hi, Can you please try one run with a pre-downloaded db? to download: python3 ui/egapx.py -dl -lc local_cache And then run with -lc local_cache added to the end of your egapx.py command.

CEPHAS-01 commented 3 weeks ago

Alright, I have started the run and I will give you feedback on this. Thanks!

CEPHAS-01 commented 3 weeks ago

@pstrope

python3 ui/egapx.py -dl -lc local_cache

!!WARNING!! This is an alpha release with limited features and organism scope to collect initial feedback on execution. Outputs are not yet complete and not intended for production use.

Downloading gnomon/2

Traceback (most recent call last): File "egapx/ui/egapx.py", line 1179, in sys.exit(main(sys.argv)) File "egapx/ui/egapx.py", line 1017, in main download_egapx_ftp_data(args.local_cache) File "egapx/ui/egapx.py", line 203, in download_egapx_ftp_data ftpd.connect(FTP_EGAP_SERVER) File "egapx/ui/egapx.py", line 78, in connect self.reconnect() File "egapx/ui/egapx.py", line 81, in reconnect self.ftp = FTP(self.host) File "/usr/lib64/python3.9/ftplib.py", line 121, in init self.connect(host) File "/usr/lib64/python3.9/ftplib.py", line 158, in connect self.sock = socket.create_connection((self.host, self.port), self.timeout, File "/usr/lib64/python3.9/socket.py", line 844, in create_connection raise err File "/usr/lib64/python3.9/socket.py", line 832, in create_connection sock.connect(sa) TimeoutError: [Errno 110] Connection timed out

After I got this, I confirmed that I could connect to the internet to download by using wget to download a file, and it was successful.

pstrope commented 3 weeks ago

To be clear, you were unable to download the db?

CEPHAS-01 commented 3 weeks ago

Yes, I was unable to download the db.

pstrope commented 3 weeks ago

thanks for trying. We'll get back to you.

CEPHAS-01 commented 3 weeks ago

Thanks, looking forward to it.

victzh commented 2 weeks ago

The HPC cluster you're trying this on most likely has two kind of nodes - the login node with access to the Internet and worker nodes without. They usually share a filesystem, so you need to run ui/egapx.py -dl -lc local_cache on the node with Internet access and then run the actual command on the node which is intended for the workload execution. The local_cache directory mentioned above should be accessible for both kinds of nodes.

victzh commented 2 weeks ago

I renamed the ticket because it substance changed, one kind of error was reported in the beginning, another when it was reopened. In the future, please reopen only when the issue is the same as initially reported, open new ticket otherwise.

CEPHAS-01 commented 2 weeks ago

Thanks for the update @victzh I will provide updates ASAP.

CEPHAS-01 commented 2 weeks ago

@victzh and @pstrope

First, I was able to get the db downloaded to the local cache as you advised from the login node despite being able to get files with wget from the compute node from where I was running egapx in an interactive session.

The pipeline went past the initial crash point "egapx:target_proteins_plane:run_align_sort" and encountered another one at "bam_strandedness:rnaseq_divide_by_strandedness".

The command I ran: python3 ui/egapx.py examples/input_D_farinae_small.yaml -e singularity -w testWork -o testOut -lc local_cache

executor > local (26) [53/fe032d] process > egapx:setup_genome:get_genome_info [100%] 1 of 1 ✔ [65/85de5a] process > egapx:setup_proteins:convert_proteins [100%] 1 of 1 ✔ [8e/311680] process > egapx:target_proteins_plane:miniprot:split_proteins [100%] 1 of 1 ✔ [34/40e391] process > egapx:target_proteins_plane:miniprot:run_miniprot (1) [100%] 1 of 1 ✔ [16/86e18e] process > egapx:target_proteins_plane:paf2asn:run_paf2asn (1) [100%] 1 of 1 ✔ [81/f6802e] process > egapx:target_proteins_plane:best_aligned_prot:run_best_aligned_prot [100%] 1 of 1 ✔ [68/1052c2] process > egapx:target_proteins_plane:align_filter_sa:run_align_filter_sa [100%] 1 of 1 ✔ [34/0ed301] process > egapx:target_proteins_plane:align_sort_sa:run_align_sort [100%] 1 of 1 ✔ [f5/865796] process > egapx:rnaseq_short_plane:star_index:build_index [100%] 1 of 1 ✔ [69/d3610f] process > egapx:rnaseq_short_plane:star:run_star (2) [100%] 2 of 2 ✔ [cc/bdb2df] process > egapx:rnaseq_short_plane:bam_strandedness:rnaseq_divide_by_strandedness [100%] 4 of 4, failed: 4.. [f3/f84da4] process > egapx:rnaseq_short_plane:bam_bin_and_sort:calc_assembly_sizes [100%] 1 of 1 ✔ [- ] process > egapx:rnaseq_short_plane:bam_bin_and_sort:bam_bin (1) - [- ] process > egapx:rnaseq_short_plane:bam_bin_and_sort:merge_prepare - [- ] process > egapx:rnaseq_short_plane:bam_bin_and_sort:merge - [- ] process > egapx:rnaseq_short_plane:bam2asn:convert - [- ] process > egapx:rnaseq_short_plane:rnaseq_collapse:generate_jobs - [- ] process > egapx:rnaseq_short_plane:rnaseq_collapse:run_rnaseq_collapse - [- ] process > egapx:rnaseq_short_plane:rnaseq_collapse:run_gpx_make_outputs - [- ] process > egapx:gnomon_plane:chainer:run_align_sort - [- ] process > egapx:gnomon_plane:chainer:generate_jobs - [- ] process > egapx:gnomon_plane:chainer:run_chainer - [- ] process > egapx:gnomon_plane:chainer:run_gpx_make_outputs - [- ] process > egapx:gnomon_plane:gnomon_wnode:gpx_qsubmit - [- ] process > egapx:gnomon_plane:gnomon_wnode:annot - [- ] process > egapx:gnomon_plane:gnomon_wnode:gpx_qdump - [48/b839c4] process > egapx:annot_proc_plane:fetch_swiss_prot_asn [100%] 1 of 1 ✔ [7c/e709cf] process > egapx:annot_proc_plane:get_swiss_prot_ids [100%] 1 of 1 ✔ [- ] process > egapx:annot_proc_plane:prot_gnomon_prepare:prot_gnomon_prepare_p - [- ] process > egapx:annot_proc_plane:diamond_worker:run_diamond_egap - [- ] process > egapx:annot_proc_plane:best_protein_hits:run_protein_filter_replacement - [- ] process > egapx:annot_proc_plane:gnomon_biotype:run_gnomon_biotype - [78/cc269f] process > egapx:annot_proc_plane:annot_builder:annot_builder_main [100%] 1 of 1 ✔ [- ] process > egapx:annot_proc_plane:annot_builder:annot_builder_input - [- ] process > egapx:annot_proc_plane:annot_builder:annot_builder_run - [74/3947c0] process > egapx:annot_proc_plane:print_fake_lxr_data [100%] 1 of 1 ✔ [28/55c4ff] process > egapx:annot_proc_plane:orthology_plane:fetch_ortholog_references [100%] 1 of 1 ✔ [f2/a2d5e9] process > egapx:annot_proc_plane:orthology_plane:setup_ext_genome:get_genome_info [100%] 1 of 1 ✔ [c7/736ba8] process > egapx:annot_proc_plane:orthology_plane:setup_ext_proteins:convert_proteins [100%] 1 of 1 ✔ [ca/e72b22] process > egapx:annot_proc_plane:orthology_plane:get_prot_ref_ids [100%] 1 of 1 ✔ [- ] process > egapx:annot_proc_plane:orthology_plane:extract_products_from_models:run_e... - [- ] process > egapx:annot_proc_plane:orthology_plane:diamond_orthology:run_diamond_egap - [- ] process > egapx:annot_proc_plane:orthology_plane:find_orthologs:run_find_orthologs - [- ] process > egapx:annot_proc_plane:locus_track:run_locus_track - [- ] process > egapx:annot_proc_plane:locus_link:run_locus_link - [- ] process > egapx:annot_proc_plane:final_asn_markup:final_asn - [- ] process > egapx:annot_proc_plane:annotwriter:run_annotwriter - [- ] process > egapx:convert_annotations:run_converter - [- ] process > export - [0b/c0bffd] NOTE: Process egapx:rnaseq_short_plane:bam_strandedness:rnaseq_divide_by_strandedness terminated with an error exit status (3) -- Execution is retried (1) [98/4e2f96] NOTE: Process egapx:rnaseq_short_plane:bam_strandedness:rnaseq_divide_by_strandedness terminated with an error exit status (3) -- Execution is retried (2) [34/1d9722] NOTE: Process egapx:rnaseq_short_plane:bam_strandedness:rnaseq_divide_by_strandedness terminated with an error exit status (3) -- Execution is retried (3) ERROR ~ Error executing process > 'egapx:rnaseq_short_plane:bam_strandedness:rnaseq_divide_by_strandedness'

Caused by: Process egapx:rnaseq_short_plane:bam_strandedness:rnaseq_divide_by_strandedness terminated with an error exit status (3)

Command executed:

mkdir -p output samtools=$(which samtools) echo "GCA_020809275.1_ASM2080927v1_genomic-SRR8506572-Aligned.out.Sorted.bam GCA_020809275.1_ASM2080927v1_genomic-SRR9005248-Aligned.out.Sorted.bam" > bam_list.mft rnaseq_divide_by_strandedness -align-manifest bam_list.mft -metadata egapx_reads_metadata_ga8t4wc9.tsv -min-aligned 1000000 -min-unambiguous 200 -min-unambiguous-pct 2 -max-unambiguous-pct 100 -percentage-threshold 98 -samtools-executable $samtools -stranded-output output/stranded.list -strandedness-output output/run.strandedness -unstranded-output output/unstranded.list

Command exit status: 3

Command output: (empty)

Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred Error: (CFileErrnoException::eFileIO) mkstemp() failed for 'egapx3/egapTmpDir/540624880960XXXXXX' (errno = 2: No such file or directory) Error: (106.16) Application's execution failed (CFileErrnoException::eFileIO) mkstemp() failed for 'egapx3/egapTmpDir/540624880960XXXXXX' (errno = 2: No such file or directory)

Work dir: egapx/testWork/cc/bdb2df74b16afe55d70d69c0aab850

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check 'testOut/nextflow.log' file for details

pstrope commented 2 weeks ago

@CEPHAS-01 Please look at https://github.com/ncbi/egapx/issues/47 for the solution. This issue will be fixed in the next release, but there is a config modification you can do to proceed for now.

victzh commented 2 weeks ago

At this time it's probably easier to wait for 0.3.1 which is being tested and fixes many bugs caught by our brave alpha users, including you Temitayo! Thanks for reporting back!

CEPHAS-01 commented 2 weeks ago

Hi @victzh After setting the environment variables directly on the terminal (and not through the singularity.config file), I successfully ran the input_D_farinae_small.yaml data. Thanks so much to you guys for guiding this to a successful completion. This D.farinae test is a small genome, so I am going to run this on my real genome about 2.8Gb and see what we have.

CEPHAS-01 commented 2 weeks ago

I successfully ran the pipeline with my assembly, many thanks to you guys @victzh , @pstrope, and the rest of the team for this tool and for holding my hand while walking down this journey to get the pipeline to work! I am closing this issue as completed!