ncbi / egapx

Eukaryotic Genome Annotation Pipeline-External caller scripts and documentation
Other
90 stars 9 forks source link

mktemp: failed to create directory via template 'tmp.XXXXXXXXXX': Read-only file system #39

Closed CEPHAS-01 closed 3 weeks ago

CEPHAS-01 commented 1 month ago

Hello EGAP Team,

I encountered this error while trying to run the EGAPx pipeline on a sheep genome (~2.8GB) and it pertains to an inability to write to a location.

"mktemp: failed to create directory via template '/tmp.XXXXXXXXXX': Read-only file system"

I initially ran into this problem while running the "input_D_farinae_small.yaml" test data attached to the pipeline but when I pointed my $TMPDIR to a writeable directory, the example successfully completed. I am using the same location for my $TMPDIR so I am SURE that the location is writeable.

I am using apptainer 1.3.3 and nextflow 23.10.1

Kindly assist in resolving this. Thank you!

Temitayo

victzh commented 1 month ago

Hi Temitayo,

can you provide us with more details about the error - which process the error appears in? It can be because in your local system the apptainer configuration points TMPDIR to root by default. We can try to avoid using mktemp without specifying which directory to use but to make sure I'd like to check the actual error.

CEPHAS-01 commented 1 month ago

Hello @victzh

Here are some more details

N E X T F L O W ~ version 23.10.1 Launching /packages/egapx/ui/../nf/ui.nf [elated_almeida] DSL2 - revision: c134f40af5 WARN: Nextflow variables must be defined in the launching environment - The following variable set in the config file is going to be ignored: 'NXF_TEMP' WARN: Nextflow variables must be defined in the launching environment - The following variable set in the config file is going to be ignored: 'NXF_WORK' WARN: Nextflow variables must be defined in the launching environment - The following variable set in the config file is going to be ignored: 'NXF_SINGULARITY_CACHEDIR' in egapx block executor > local (2) [ac/485ff2] process > egapx:setup_genome:get_genome_info [100%] 1 of 1, cached: 1 ✔ [60/78fb62] process > egapx:setup_proteins:convert_proteins [100%] 1 of 1, cached: 1 ✔ [59/b75a1f] process > egapx:miniprot:run_miniprot [100%] 1 of 1, cached: 1 ✔ [c6/a38d4d] process > egapx:paf2asn:run_paf2asn [ 0%] 0 of 1 [- ] process > egapx:best_aligned_prot:run_best_aligned_prot - [- ] process > egapx:align_filter_sa:run_align_filter_sa - [- ] process > egapx:run_align_sort - [8c/ccad9b] process > egapx:star_index:build_index [100%] 1 of 1, cached: 1 ✔ [cf/5f4b7a] process > egapx:star_simplified:exec (2) [100%] 2 of 2, cached: 2 ✔ [71/b61f7c] process > egapx:bam_strandedness:exec (2) [100%] 2 of 2, cached: 2 ✔ [34/d583e8] process > egapx:bam_strandedness:merge [100%] 1 of 1, cached: 1 ✔ [c3/bb7a79] process > egapx:bam_bin_and_sort:calc_assembly_sizes [100%] 1 of 1, cached: 1 ✔ [39/70b682] process > egapx:bam_bin_and_sort:bam_bin (2) [100%] 2 of 2, cached: 2 ✔ [6b/957b67] process > egapx:bam_bin_and_sort:merge_prepare [100%] 1 of 1, cached: 1 ✔ [66/05fa29] process > egapx:bam_bin_and_sort:merge (1) [100%] 1 of 1, cached: 1 ✔ [b9/041ec2] process > egapx:bam2asn:convert (1) [ 0%] 0 of 1 [- ] process > egapx:rnaseq_collapse:generate_jobs - [- ] process > egapx:rnaseq_collapse:run_rnaseq_collapse - [- ] process > egapx:rnaseq_collapse:run_gpx_make_outputs - executor > local (2) [ac/485ff2] process > egapx:setup_genome:get_genome_info [100%] 1 of 1, cached: 1 ✔ [60/78fb62] process > egapx:setup_proteins:convert_proteins [100%] 1 of 1, cached: 1 ✔ [59/b75a1f] process > egapx:miniprot:run_miniprot [100%] 1 of 1, cached: 1 ✔ [- ] process > egapx:paf2asn:run_paf2asn - [- ] process > egapx:best_aligned_prot:run_best_aligned_prot - [- ] process > egapx:align_filter_sa:run_align_filter_sa - [- ] process > egapx:run_align_sort - [8c/ccad9b] process > egapx:star_index:build_index [100%] 1 of 1, cached: 1 ✔ [cf/5f4b7a] process > egapx:star_simplified:exec (2) [100%] 2 of 2, cached: 2 ✔ [71/b61f7c] process > egapx:bam_strandedness:exec (2) [100%] 2 of 2, cached: 2 ✔ [34/d583e8] process > egapx:bam_strandedness:merge [100%] 1 of 1, cached: 1 ✔ [c3/bb7a79] process > egapx:bam_bin_and_sort:calc_assembly_sizes [100%] 1 of 1, cached: 1 ✔ [39/70b682] process > egapx:bam_bin_and_sort:bam_bin (2) [100%] 2 of 2, cached: 2 ✔ [6b/957b67] process > egapx:bam_bin_and_sort:merge_prepare [100%] 1 of 1, cached: 1 ✔ [66/05fa29] process > egapx:bam_bin_and_sort:merge (1) [100%] 1 of 1, cached: 1 ✔ [b9/041ec2] process > egapx:bam2asn:convert (1) [100%] 1 of 1, failed: 1 ✘ [- ] process > egapx:rnaseq_collapse:generate_jobs - [- ] process > egapx:rnaseq_collapse:run_rnaseq_collapse - [- ] process > egapx:rnaseq_collapse:run_gpx_make_outputs - [1a/fc5514] process > egapx:get_hmm_params:run_get_hmm [100%] 1 of 1, cached: 1 ✔ [- ] process > egapx:chainer:run_align_sort - [- ] process > egapx:chainer:generate_jobs - [- ] process > egapx:chainer:run_chainer - [- ] process > egapx:chainer:run_gpx_make_outputs - [- ] process > egapx:gnomon_wnode:gpx_qsubmit - [- ] process > egapx:gnomon_wnode:annot - [- ] process > egapx:gnomon_wnode:gpx_qdump - [c6/40c30d] process > egapx:annot_builder:annot_builder_main [100%] 1 of 1, cached: 1 ✔ [- ] process > egapx:annot_builder:annot_builder_input - [- ] process > egapx:annot_builder:annot_builder_run - [- ] process > egapx:annotwriter:run_annotwriter - [- ] process > export - ERROR ~ Error executing process > 'egapx:bam2asn:convert (1)'

Caused by: Process egapx:bam2asn:convert (1) terminated with an error exit status (1)

Command executed:

tmpdir=mktemp -d samtools=which samtools lds2_indexer -source genome/ -db LDS2 '#EXCEPTION_STACK_TRACE_LEVEL=Warning DEBUG_STACK_TRACE_LEVEL=Warning DIAG_POST_LEVEL=Trace sam2asn -filter 'pct_identity_gap >= 95' -ofmt seq-align-compressed -collapse-identical -no-scores -ifmt bam -refs-local-by-default -nogenbank -lds2 LDS2 -tmp-dir $tmpdir -align-counts "GCF_020809275.1_ASM2080927v1_genomic-bin1.align_counts.txt" -o "GCF_020809275.1_ASM2080927v1_genomic-bin1.align.asnb.gz" -strandedness run.strandedness -input GCF_020809275.1_ASM2080927v1_genomic-bin1.bam -samtools-path $samtools

Command exit status: 1

Command output: (empty)

Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred INFO: Environment variable SINGULARITYENV_https_proxy is set, but APPTAINERENV_https_proxy is preferred INFO: Environment variable SINGULARITYENV_http_proxy is set, but APPTAINERENV_http_proxy is preferred INFO: Environment variable SINGULARITYENV_SLURM_JOB_ID is set, but APPTAINERENV_SLURM_JOB_ID is preferred mktemp: failed to create directory via template '/local/scratch/temitayo.olagunju/15275394/tmp.XXXXXXXXXX': No such file or directory

Work dir: /egapTmp/b9/041ec2475977996ae293b06d49457f

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh

-- Check '/testRun_out/nextflow.log' file for details

victzh commented 1 month ago

OK, it will be fixed in the next release. If you're impatient, find line

tmpdir=`mktemp -d`

in the file nf/subworkflows/ncbi/rnaseq_short/convert_from_bam/main.nf and replace it with

tmpdir=`mktemp -d --tmpdir=.`

If you do this, please report the results here. Thanks!

CEPHAS-01 commented 1 month ago

Thanks @victzh ! I will check this out and revert.

CEPHAS-01 commented 1 month ago

@victzh

The initial problem appeared solved but another was encountered...

[de/fffd8f] process > egapx:setup_genome:get_genome_info [100%] 1 of 1 ✔ [cb/5db6a0] process > egapx:setup_proteins:convert_proteins [100%] 1 of 1 ✔ [ec/3548f1] process > egapx:miniprot:run_miniprot [100%] 1 of 1 ✔ [77/5afc69] process > egapx:paf2asn:run_paf2asn [100%] 1 of 1 ✔ [41/61db8f] process > egapx:best_aligned_prot:run_best_aligned_prot [100%] 1 of 1 ✔ [a2/8758ae] process > egapx:align_filter_sa:run_align_filter_sa [100%] 1 of 1 ✔ [4e/d1ed74] process > egapx:run_align_sort [100%] 1 of 1, failed: 1 ✘ [6c/d77eab] process > egapx:star_index:build_index [100%] 1 of 1 ✔ [- ] process > egapx:star_simplified:exec (1) - [- ] process > egapx:bam_strandedness:exec - [- ] process > egapx:bam_strandedness:merge - [- ] process > egapx:bam_bin_and_sort:calc_assembly_sizes - [- ] process > egapx:bam_bin_and_sort:bam_bin - [- ] process > egapx:bam_bin_and_sort:merge_prepare - [- ] process > egapx:bam_bin_and_sort:merge - [- ] process > egapx:bam2asn:convert - [- ] process > egapx:rnaseq_collapse:generate_jobs - [- ] process > egapx:rnaseq_collapse:run_rnaseq_collapse - [- ] process > egapx:rnaseq_collapse:run_gpx_make_outputs - [9d/1f82d8] process > egapx:get_hmm_params:run_get_hmm [100%] 1 of 1 ✔ [- ] process > egapx:chainer:run_align_sort - [- ] process > egapx:chainer:generate_jobs - [- ] process > egapx:chainer:run_chainer - [- ] process > egapx:chainer:run_gpx_make_outputs - [- ] process > egapx:gnomon_wnode:gpx_qsubmit - [- ] process > egapx:gnomon_wnode:annot - [- ] process > egapx:gnomon_wnode:gpx_qdump - [da/3e9a37] process > egapx:annot_builder:annot_builder_main [100%] 1 of 1 ✔ [- ] process > egapx:annot_builder:annot_builder_input - [- ] process > egapx:annot_builder:annot_builder_run - [- ] process > egapx:annotwriter:run_annotwriter - [- ] process > export - ERROR ~ Error executing process > 'egapx:run_align_sort'

Caused by: Process egapx:run_align_sort terminated with an error exit status (3)

Command executed:

mkdir -p output mkdir -p LDS_Index lds2_indexer -source LDS_Index echo "align.asn" > alignments.mft align_sort -k subject,subject_start,-subject_end,subject_strand,query,query_start,-query_end,query_strand,-num_ident,gap_count -input-manifest alignments.mft -o output/sorted_aligns.asn -lds2 LDS_Index/lds2.db

Command exit status: 3

Command output: (empty)

Command error: INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred Error: (CException::eUnknown) failed to create temporary path Error: (106.16) Application's execution failed (CException::eUnknown) failed to create temporary path

Work dir: /work/4e/d1ed7451a369eb41071529d9d2ec2e

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

-- Check '/out/nextflow.log' file for details

CEPHAS-01 commented 1 month ago

@victzh , the last error may have been due to storage space shortage. I freed up some more space and restarted and it went past the point where it produced that error. However, another was encountered:

  echo "tmpdwb5rgmk" > metadata.mft

  # HACK: derive start_job_id from job file extension
  filename=$(basename -- "job.007")
  extension="${filename##*.}"
  # NB: for successful gather phase all job id should be unique,
  # so we must supply start_job_id.
  (( start_job_id = ((10#$extension) * 136) + 1 ))

  # make the local LDS of the genomic sequences
  lds2_indexer -source ./genome -db ./genome_lds  

  # When running multiple jobs on the cluster there is a chance that
  # several jobs will run on the same node and thus generate files
  # with the same filename. We need to avoid that to be able to stage
  # the output files for gpx_make_outputs. We add the job file numeric
  # extension as a prefix to the filename.
  mkdir interim
  rnaseq_collapse -backlog 1 -max-jobs 1 -rank-counts-precalculated -O interim -nogenbank -lds2 ./genome_lds -sorted-vols align.mft -scaffold-list scaffold_list.mft -sra-metadata-man

ifest metadata.mft -start-job-id $start_jobid -input-jobs job.007 -workers $threads mkdir output for f in interim/*; do if [ -f $f ]; then mv $f output/${extension}$(basename $f) fi done

Command exit status: 3

Command output: (empty)

... ... ... ...

00802/001/0195/RB 62970322715557A1 1018/0039 2024-10-20T19:10:45.401578 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse request-start internal_thread=mo nitor&cpu_count=64&normalized_load=118&total_ram=1082138755072&effective_ram=1079991271424&pct_ram_used=0&mem_total=922816512&mem_peak=922869760&mem_self=922816512&mem_available=10790684 54912&ncbi_app_version=0.0.26311&ncbi_app_sc_version=28&ncbi_app_vcs_revision=685465&ncbi_app_revision=685465 00802/001/0195/RE 62970322715557A1 1019/0040 2024-10-20T19:10:45.401646 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse request-stop 200 0.000071048 0 0 00802/007/0179/R 62970322715557A1 1020/0027 2024-10-20T19:10:45.405357 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse extra num_skipped=476385 00802/007/0179/R 62970322715557A1 1021/0028 2024-10-20T19:10:45.406063 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse Error: RNASEQ(CException::eUnkno wn) "collapse_group.cpp", line 185: SMemberAlignment::SMemberAlignment() --- Unknown run: Reads 00802/007/0179/R 62970322715557A1 1022/0029 2024-10-20T19:10:45.406105 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse Error: LIB "wn_worker_thread.cpp ", line 267: ncbi::CWorkerThread::x_DoJob() --- error processing job: (CException::eUnknown) Unknown run: Reads 00802/007/0179/RE 62970322715557A1 1023/0030 2024-10-20T19:10:45.406407 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse request-stop 500 11.857090950 0 40 00802/010/0181/R 62970322715557A1 1024/0051 2024-10-20T19:10:46.997726 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse extra num_skipped=503956 00802/010/0181/R 62970322715557A1 1025/0052 2024-10-20T19:10:46.998339 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse Error: RNASEQ(CException::eUnkno wn) "collapse_group.cpp", line 185: SMemberAlignment::SMemberAlignment() --- Unknown run: Reads 00802/010/0181/R 62970322715557A1 1026/0053 2024-10-20T19:10:46.998409 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse Error: LIB "wn_worker_thread.cpp ", line 267: ncbi::CWorkerThread::x_DoJob() --- error processing job: (CException::eUnknown) Unknown run: Reads 00802/010/0181/RE 62970322715557A1 1027/0054 2024-10-20T19:10:46.998713 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse request-stop 500 12.116225004 0 40 00802/003/0196/RB 62970322715557A1 1028/0013 2024-10-20T19:10:46.999331 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse request-start internal_thread=ou tput&processing_cycle=4&ncbi_app_version=0.0.26311&ncbi_app_sc_version=28&ncbi_app_vcs_revision=685465&ncbi_app_revision=685465 00802/003/0196/R 62970322715557A1 1029/0014 2024-10-20T19:10:46.999409 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse extra success_jobs=0&fai l_jobs=16 00802/003/0196/R 62970322715557A1 1030/0015 2024-10-20T19:10:46.999703 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse extra attempts=1 00802/003/0196/RE 62970322715557A1 1031/0016 2024-10-20T19:10:46.999735 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse request-stop 200 0.000416040 0 0 00802/000/0000/P 62970322715557A1 1032/0088 2024-10-20T19:10:48.427646 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse Error: LIB(CException::eUnknown) "wn_app_core.cpp", line 256: ncbi::CGPX_WorkerAppCore::Run() --- 121 jobs failed 00802/000/0000/P 62970322715557A1 1033/0089 2024-10-20T19:10:48.427728 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse Error: CORELIB(106.16) "ncbiapp. cpp", line 700: ncbi::CNcbiApplicationAPI::x_TryMain() --- Application's execution failed (CException::eUnknown) 121 jobs failed 00802/000/0000/PE 62970322715557A1 1034/0090 2024-10-20T19:10:48.506753 .hpc.domain.edu UNK_CLIENT UNK_SESSION rnaseq_collapse stop 3 62.275131940

Work dir: /egapAnnotation/work/62/a9f3e4922641276d07105aa756125a

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run Oct-20 12:10:49.115 [Task monitor] DEBUG nextflow.Session - Session aborted -- Cause: Process egapx:rnaseq_collapse:run_rnaseq_collapse (8) terminated with an error exit status (3) Oct-20 12:10:49.251 [Task monitor] DEBUG nextflow.Session - The following nodes are still active: [process] egapx:rnaseq_collapse:run_gpx_make_outputs status=ACTIVE port 0: (value) OPEN ; channel: files port 1: (value) bound ; channel: params port 2: (cntrl) - ; channel: $

Oct-20 12:10:49.261 [main] DEBUG nextflow.Session - Session await > all processes finished Oct-20 12:10:49.261 [main] DEBUG nextflow.Session - Session await > all barriers passed Oct-20 12:10:49.429 [main] WARN n.processor.TaskPollingMonitor - Killing running tasks (9) Oct-20 12:10:49.846 [Task monitor] DEBUG n.processor.TaskPollingMonitor - <<< barrier arrives (monitor: local) - terminating tasks monitor poll loop Oct-20 12:10:49.941 [main] DEBUG n.trace.WorkflowStatsObserver - Workflow completed > WorkflowStats[succeededCount=0; failedCount=1; ignoredCount=0; cachedCount=295; pendingCount=0; submittedCount=1; runningCount=-1; retriesCount=0; abortedCount=9; succeedDuration=0ms; failedDuration=11m 21s; cachedDuration=26d 17h 45m 39s;loadCpus=-7; loadMemory=0; peakRunning=9; peakCpus=63; peakMemory=540 GB; ] Oct-20 12:10:49.942 [main] DEBUG nextflow.trace.TraceFileObserver - Workflow completed -- saving trace file Oct-20 12:10:49.945 [main] DEBUG nextflow.trace.ReportObserver - Workflow completed -- rendering execution report Oct-20 12:10:50.920 [main] DEBUG nextflow.trace.TimelineObserver - Workflow completed -- rendering execution timeline Oct-20 12:10:51.151 [main] DEBUG nextflow.cache.CacheDB - Closing CacheDB done Oct-20 12:10:51.281 [main] DEBUG nextflow.script.ScriptRunner - > Execution complete -- Goodbye

CEPHAS-01 commented 3 weeks ago

@victzh Just to add a bit more information to this, I have subset the chromosomes, using just 2 out of 26 to be sure that the whole genome is not taking too much space, but the pipeline still broke at the same point. In any case, I have tried on another HPC with significant amount of storage (just to be sure that storage was not an issue), it broke at the exact same spot.

victzh commented 3 weeks ago

I'm closing this issue as the original problem is resolved. If you still have problems running it, please report them in another issue.