Closed CEPHAS-01 closed 2 weeks ago
Thank you Temitayo, I checked this report and have a suspicious code responsible. As we can't reproduce this bug directly I will fix this suspicious code in upcoming release 0.3 and ask you to test it then. This is similar to issue #39 but this time it is not just mktemp, which obeys TMPDIR and can be fixed by setting it.
Alright @victzh , thank you for your feedback. I will be on the lookout for the release update.
Please try our new release, 0.3.0-alpha which was completed today. I'd appreciate your feedback whether it solved the issue or not.
Hi @victzh Thanks so much for this prompt update!
I have tried the new release out and encountered the following:
python3 ui/egapx.py ./examples/input_D_farinae_small_readlist.yaml -e singularity -w tmpDir/workDir -o small_farinae
!!WARNING!! This is an alpha release with limited features and organism scope to collect initial feedback on execution. Outputs are not yet complete and not intended for production use.
Could not find path for ortho taxid 7227 (take note that the taxid in the example file was 6954)
The example/input_D_farinae_small_readlist.yaml file contained:
genome: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/020/809/275/GCA_020809275.1_ASM2080927v1/GCA_020809275.1_ASM2080927v1_genomic.fna.gz taxid: 6954 reads: /egapx2/egapx/examples/input_D_farinae_small_reads.txt annotation_provider: GenBank submitter annotation_name_prefix: GCA_020809275.1 locus_tag_prefix: egapxtmp
Could not find path for ortho taxid 9606 (take note that the taxid I specified was 37174)
The yml file I used for this is: genome: Bighorn.fasta taxid: 37174 reads:
For both instances, I kept the singularity.config file simple as follows: singularity { enabled = true autoMounts = true }
The main observations were a. It took over 15 minutes for that output to be written and there was no sign that nextflow was launched at all. To confirm that my applications were correctly loaded, I ran the previous release and within 20 seconds had nextflow launched. b. it seems that taxids are being sourced from somewhere different from those specified in the yml file. c. There was no log file produced to get an idea of what may have gone wrong. Only the "work_dir_singularity.last" file containing the path to the work directory was produced.
Hi,
Can you please try one run with a pre-downloaded db?
to download:
python3 ui/egapx.py -dl -lc local_cache
And then run with -lc local_cache
added to the end of your egapx.py
command.
Alright, I have started the run and I will give you feedback on this. Thanks!
@pstrope
python3 ui/egapx.py -dl -lc local_cache
!!WARNING!! This is an alpha release with limited features and organism scope to collect initial feedback on execution. Outputs are not yet complete and not intended for production use.
Downloading gnomon/2
Traceback (most recent call last):
File "egapx/ui/egapx.py", line 1179, in
After I got this, I confirmed that I could connect to the internet to download by using wget to download a file, and it was successful.
To be clear, you were unable to download the db?
Yes, I was unable to download the db.
thanks for trying. We'll get back to you.
Thanks, looking forward to it.
The HPC cluster you're trying this on most likely has two kind of nodes - the login node with access to the Internet and worker nodes without. They usually share a filesystem, so you need to run ui/egapx.py -dl -lc local_cache on the node with Internet access and then run the actual command on the node which is intended for the workload execution. The local_cache directory mentioned above should be accessible for both kinds of nodes.
I renamed the ticket because it substance changed, one kind of error was reported in the beginning, another when it was reopened. In the future, please reopen only when the issue is the same as initially reported, open new ticket otherwise.
Thanks for the update @victzh I will provide updates ASAP.
@victzh and @pstrope
First, I was able to get the db downloaded to the local cache as you advised from the login node despite being able to get files with wget from the compute node from where I was running egapx in an interactive session.
The pipeline went past the initial crash point "egapx:target_proteins_plane:run_align_sort" and encountered another one at "bam_strandedness:rnaseq_divide_by_strandedness".
The command I ran: python3 ui/egapx.py examples/input_D_farinae_small.yaml -e singularity -w testWork -o testOut -lc local_cache
executor > local (26)
[53/fe032d] process > egapx:setup_genome:get_genome_info [100%] 1 of 1 ✔
[65/85de5a] process > egapx:setup_proteins:convert_proteins [100%] 1 of 1 ✔
[8e/311680] process > egapx:target_proteins_plane:miniprot:split_proteins [100%] 1 of 1 ✔
[34/40e391] process > egapx:target_proteins_plane:miniprot:run_miniprot (1) [100%] 1 of 1 ✔
[16/86e18e] process > egapx:target_proteins_plane:paf2asn:run_paf2asn (1) [100%] 1 of 1 ✔
[81/f6802e] process > egapx:target_proteins_plane:best_aligned_prot:run_best_aligned_prot [100%] 1 of 1 ✔
[68/1052c2] process > egapx:target_proteins_plane:align_filter_sa:run_align_filter_sa [100%] 1 of 1 ✔
[34/0ed301] process > egapx:target_proteins_plane:align_sort_sa:run_align_sort [100%] 1 of 1 ✔
[f5/865796] process > egapx:rnaseq_short_plane:star_index:build_index [100%] 1 of 1 ✔
[69/d3610f] process > egapx:rnaseq_short_plane:star:run_star (2) [100%] 2 of 2 ✔
[cc/bdb2df] process > egapx:rnaseq_short_plane:bam_strandedness:rnaseq_divide_by_strandedness [100%] 4 of 4, failed: 4..
[f3/f84da4] process > egapx:rnaseq_short_plane:bam_bin_and_sort:calc_assembly_sizes [100%] 1 of 1 ✔
[- ] process > egapx:rnaseq_short_plane:bam_bin_and_sort:bam_bin (1) -
[- ] process > egapx:rnaseq_short_plane:bam_bin_and_sort:merge_prepare -
[- ] process > egapx:rnaseq_short_plane:bam_bin_and_sort:merge -
[- ] process > egapx:rnaseq_short_plane:bam2asn:convert -
[- ] process > egapx:rnaseq_short_plane:rnaseq_collapse:generate_jobs -
[- ] process > egapx:rnaseq_short_plane:rnaseq_collapse:run_rnaseq_collapse -
[- ] process > egapx:rnaseq_short_plane:rnaseq_collapse:run_gpx_make_outputs -
[- ] process > egapx:gnomon_plane:chainer:run_align_sort -
[- ] process > egapx:gnomon_plane:chainer:generate_jobs -
[- ] process > egapx:gnomon_plane:chainer:run_chainer -
[- ] process > egapx:gnomon_plane:chainer:run_gpx_make_outputs -
[- ] process > egapx:gnomon_plane:gnomon_wnode:gpx_qsubmit -
[- ] process > egapx:gnomon_plane:gnomon_wnode:annot -
[- ] process > egapx:gnomon_plane:gnomon_wnode:gpx_qdump -
[48/b839c4] process > egapx:annot_proc_plane:fetch_swiss_prot_asn [100%] 1 of 1 ✔
[7c/e709cf] process > egapx:annot_proc_plane:get_swiss_prot_ids [100%] 1 of 1 ✔
[- ] process > egapx:annot_proc_plane:prot_gnomon_prepare:prot_gnomon_prepare_p -
[- ] process > egapx:annot_proc_plane:diamond_worker:run_diamond_egap -
[- ] process > egapx:annot_proc_plane:best_protein_hits:run_protein_filter_replacement -
[- ] process > egapx:annot_proc_plane:gnomon_biotype:run_gnomon_biotype -
[78/cc269f] process > egapx:annot_proc_plane:annot_builder:annot_builder_main [100%] 1 of 1 ✔
[- ] process > egapx:annot_proc_plane:annot_builder:annot_builder_input -
[- ] process > egapx:annot_proc_plane:annot_builder:annot_builder_run -
[74/3947c0] process > egapx:annot_proc_plane:print_fake_lxr_data [100%] 1 of 1 ✔
[28/55c4ff] process > egapx:annot_proc_plane:orthology_plane:fetch_ortholog_references [100%] 1 of 1 ✔
[f2/a2d5e9] process > egapx:annot_proc_plane:orthology_plane:setup_ext_genome:get_genome_info [100%] 1 of 1 ✔
[c7/736ba8] process > egapx:annot_proc_plane:orthology_plane:setup_ext_proteins:convert_proteins [100%] 1 of 1 ✔
[ca/e72b22] process > egapx:annot_proc_plane:orthology_plane:get_prot_ref_ids [100%] 1 of 1 ✔
[- ] process > egapx:annot_proc_plane:orthology_plane:extract_products_from_models:run_e... -
[- ] process > egapx:annot_proc_plane:orthology_plane:diamond_orthology:run_diamond_egap -
[- ] process > egapx:annot_proc_plane:orthology_plane:find_orthologs:run_find_orthologs -
[- ] process > egapx:annot_proc_plane:locus_track:run_locus_track -
[- ] process > egapx:annot_proc_plane:locus_link:run_locus_link -
[- ] process > egapx:annot_proc_plane:final_asn_markup:final_asn -
[- ] process > egapx:annot_proc_plane:annotwriter:run_annotwriter -
[- ] process > egapx:convert_annotations:run_converter -
[- ] process > export -
[0b/c0bffd] NOTE: Process egapx:rnaseq_short_plane:bam_strandedness:rnaseq_divide_by_strandedness
terminated with an error exit status (3) -- Execution is retried (1)
[98/4e2f96] NOTE: Process egapx:rnaseq_short_plane:bam_strandedness:rnaseq_divide_by_strandedness
terminated with an error exit status (3) -- Execution is retried (2)
[34/1d9722] NOTE: Process egapx:rnaseq_short_plane:bam_strandedness:rnaseq_divide_by_strandedness
terminated with an error exit status (3) -- Execution is retried (3)
ERROR ~ Error executing process > 'egapx:rnaseq_short_plane:bam_strandedness:rnaseq_divide_by_strandedness'
Caused by:
Process egapx:rnaseq_short_plane:bam_strandedness:rnaseq_divide_by_strandedness
terminated with an error exit status (3)
Command executed:
mkdir -p output samtools=$(which samtools) echo "GCA_020809275.1_ASM2080927v1_genomic-SRR8506572-Aligned.out.Sorted.bam GCA_020809275.1_ASM2080927v1_genomic-SRR9005248-Aligned.out.Sorted.bam" > bam_list.mft rnaseq_divide_by_strandedness -align-manifest bam_list.mft -metadata egapx_reads_metadata_ga8t4wc9.tsv -min-aligned 1000000 -min-unambiguous 200 -min-unambiguous-pct 2 -max-unambiguous-pct 100 -percentage-threshold 98 -samtools-executable $samtools -stranded-output output/stranded.list -strandedness-output output/run.strandedness -unstranded-output output/unstranded.list
Command exit status: 3
Command output: (empty)
Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred Error: (CFileErrnoException::eFileIO) mkstemp() failed for 'egapx3/egapTmpDir/540624880960XXXXXX' (errno = 2: No such file or directory) Error: (106.16) Application's execution failed (CFileErrnoException::eFileIO) mkstemp() failed for 'egapx3/egapTmpDir/540624880960XXXXXX' (errno = 2: No such file or directory)
Work dir: egapx/testWork/cc/bdb2df74b16afe55d70d69c0aab850
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh
-- Check 'testOut/nextflow.log' file for details
@CEPHAS-01 Please look at https://github.com/ncbi/egapx/issues/47 for the solution. This issue will be fixed in the next release, but there is a config modification you can do to proceed for now.
At this time it's probably easier to wait for 0.3.1 which is being tested and fixes many bugs caught by our brave alpha users, including you Temitayo! Thanks for reporting back!
Hi @victzh After setting the environment variables directly on the terminal (and not through the singularity.config file), I successfully ran the input_D_farinae_small.yaml data. Thanks so much to you guys for guiding this to a successful completion. This D.farinae test is a small genome, so I am going to run this on my real genome about 2.8Gb and see what we have.
I successfully ran the pipeline with my assembly, many thanks to you guys @victzh , @pstrope, and the rest of the team for this tool and for holding my hand while walking down this journey to get the pipeline to work! I am closing this issue as completed!
Hi, I encountered this problem while running the EGAPx pipeline with Singularity. Let me provide as much information as possible. First, my singularity.config file is as follows:
singularity { enabled = true autoMounts = true singularity.cacheDir = 'egap/tmpDir' }
Run command:
python3 ui/egapx.py bighorn_reads.yml -e singularity -w egap/tmpDir/workDir -o annotation_out
Output and error:
executor > local (20) [36/f52493] process > egapx:setup_genome:get_genome_info [100%] 1 of 1 ✔ [41/0e2552] process > egapx:setup_proteins:convert_proteins [100%] 1 of 1 ✔ [78/07dbb1] process > egapx:target_proteins_plane:miniprot:split_proteins [100%] 1 of 1 ✔ [f0/367797] process > egapx:target_proteins_plane:miniprot:run_miniprot (1) [100%] 4 of 4 ✔ [70/b7c4ba] process > egapx:target_proteins_plane:paf2asn:run_paf2asn (4) [100%] 4 of 4 ✔ [f3/fec9e3] process > egapx:target_proteins_plane:best_aligned_prot:run_best_aligned_prot [100%] 1 of 1 ✔ [c3/91046d] process > egapx:target_proteins_plane:align_filter_sa:run_align_filter_sa [100%] 1 of 1 ✔ [92/d2e396] process > egapx:target_proteins_plane:run_align_sort [100%] 4 of 4, failed: 4.. [91/ab3824] process > egapx:rnaseq_short_plane:star_index:build_index [100%] 1 of 1 ✔ [- ] process > egapx:rnaseq_short_plane:star:run_star (1) - [- ] process > egapx:rnaseq_short_plane:bam_strandedness:rnaseq_divide_by_strandedness - [- ] process > egapx:rnaseq_short_plane:bam_bin_and_sort:calc_assembly_sizes - [- ] process > egapx:rnaseq_short_plane:bam_bin_and_sort:bam_bin - [- ] process > egapx:rnaseq_short_plane:bam_bin_and_sort:merge_prepare - [- ] process > egapx:rnaseq_short_plane:bam_bin_and_sort:merge - [- ] process > egapx:rnaseq_short_plane:bam2asn:convert - ... ... ...
[- ] process > egapx:gnomon_plane:gnomon_wnode:gpx_qsubmit - [- ] process > egapx:gnomon_plane:gnomon_wnode:annot - [- ] process > egapx:gnomon_plane:gnomon_wnode:gpx_qdump - [31/2aaf2a] process > egapx:annot_builder:annot_builder_main [100%] 1 of 1 ✔ [- ] process > egapx:annot_builder:annot_builder_input - [- ] process > egapx:annot_builder:annot_builder_run - [- ] process > egapx:annotwriter:run_annotwriter - [- ] process > export - [f9/d12672] NOTE: Process
egapx:target_proteins_plane:run_align_sort
terminated with an error exit status (3) -- Execution is retried (3) ERROR ~ Error executing process > 'egapx:target_proteins_plane:run_align_sort'Caused by: Process
egapx:target_proteins_plane:run_align_sort
terminated with an error exit status (3)Command executed:
mkdir -p output mkdir -p LDS_Index lds2_indexer -source LDS_Index echo "align.asn" > alignments.mft align_sort -k subject,subject_start,-subject_end,subject_strand,query,query_start,-query_end,query_strand,-num_ident,gap_count -input-manifest alignments.mft -o output/sorted_aligns.asn -lds2 LDS_Index/lds2.db
Command exit status: 3
Command output: (empty)
Command error: INFO: Environment variable SINGULARITY_TMPDIR is set, but APPTAINER_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred Error: (CException::eUnknown) failed to create temporary path Error: (106.16) Application's execution failed (CException::eUnknown) failed to create temporary path
Work dir: egap/tmpDir/workDir/92/d2e396449993bd2d15ab600858e3ae
Tip: you can replicate the issue by changing to the process work dir and entering the command
bash .command.run
-- Check 'annotation_out/nextflow.log' file for details
Some extra information I am using apptainer 1.3.3 and nextflow 23.10.1 I am using the latest version of EGAPx with a patch from here I defined this directory "egap/tmpDir" as the TMPDIR and other directories for the output and I am sure that this directory is recursively writable. I have also tried the same run on another HPC and the pipeline terminates with an error on the same process. process > 'egapx:target_proteins_plane:run_align_sort'.
Kindly help to resolve this error.
Thank you.
Temitayo