ncbi / egapx

Eukaryotic Genome Annotation Pipeline-External caller scripts and documentation
Other
70 stars 5 forks source link

Error message in test run (-e docker) #6

Closed xo2003 closed 5 months ago

xo2003 commented 5 months ago

Hi EGPAx team,

I am looking forward to the release of the EGPAx public version in the future. Recently, I tried running a test using the following command:

python3 ui/egapx.py ./examples/input_D_farinae_small.yaml -e docker -w ./temp_datapath/D_farinae -o example_out

The Docker version is 24.0.2. Below is error message from Nextflow report (run.report.html):

The full error message was:
Error executing process > 'egapx:rnaseq_collapse:run_rnaseq_collapse (6)'

Caused by:
  Process `egapx:rnaseq_collapse:run_rnaseq_collapse (6)` terminated with an error exit status (3)

Command executed:

  njobs=`wc -l  scaffold_list.mft
  echo "GCF_020809275.1_ASM2080927v1_genomic-bin1.align.asnb.gz" > align.mft
  echo "tmpxnhuwi9_" > metadata.mft

  # HACK: derive start_job_id from job file extension
  filename=$(basename -- "job.005")
  extension="${filename##*.}"
  # NB: for successful gather phase all job id should be unique,
  # so we must supply start_job_id.
  (( start_job_id = ((10#$extension) * 3) + 1 ))

  # make the local LDS of the genomic sequences
  lds2_indexer -source ./genome -db ./genome_lds  

  # When running multiple jobs on the cluster there is a chance that
  # several jobs will run on the same node and thus generate files
  # with the same filename. We need to avoid that to be able to stage
  # the output files for gpx_make_outputs. We add the job file numeric
  # extension as a prefix to the filename.
  mkdir interim
  rnaseq_collapse -backlog 1 -max-jobs 1 -rank-counts-precalculated -O interim -nogenbank -lds2 ./genome_lds -sorted-vols align.mft -scaffold-list scaffold_list.mft -sra-metadata-manifest metadata.mft -start-job-id $start_job_id -input-jobs job.005 -workers $threads
  mkdir output
  for f in interim/*; do
      if [ -f $f ]; then
          mv $f output/${extension}_$(basename $f)
      fi
  done

Command exit status:
  3

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  00090/000/0000/PB 1636005A6330F1C1 0001/0001 2024-05-02T03:57:16.796135 6b0fbe659621    UNK_CLIENT      UNK_SESSION              rnaseq_collapse start         /img/gp/bin/rnaseq_collapse -backlog 1 -max-jobs 1 -rank-counts-precalculated -O interim -nogenbank -lds2 ./genome_lds -sorted-vols align.mft -scaffold-list scaffold_list.mft -sra-metadata-manifest metadata.mft -start-job-id 16 -input-jobs job.005 -workers 3
  00090/000/0000/PB 1636005A6330F1C1 0002/0002 2024-05-02T03:57:16.797491 6b0fbe659621    UNK_CLIENT      UNK_SESSION              rnaseq_collapse extra         ncbi_app_username=root&ncbi_app_path=/img/gp/bin/rnaseq_collapse&ncbi_app_build_date=Mar+28+2024+06:42:12&ncbi_app_tc_project=Software+Compilation+%26+Artifact+Generation&ncbi_app_tc_conf=Release&ncbi_app_tc_build=25790&ncbi_app_build_id=49258646&ncbi_app_built_as=rnaseq_collapse&ncbi_app_version=0.0.25790&ncbi_app_sc_version=28&ncbi_app_vcs_revision=680859&ncbi_app_revision=680859
  00090/000/0000/P  1636005A6330F1C1 0003/0003 2024-05-02T03:57:16.822871 6b0fbe659621    UNK_CLIENT      UNK_SESSION              rnaseq_collapse Error: UTIL(CException::eUnknown) "stream_source.cpp", line 368: ncbi::CInputStreamSource::x_OpenOwnedStream() --- CInputStreamSource: File is not accessible: tmpxnhuwi9_
  00090/000/0000/P  1636005A6330F1C1 0004/0004 2024-05-02T03:57:16.822971 6b0fbe659621    UNK_CLIENT      UNK_SESSION              rnaseq_collapse Error: CORELIB(106.16) "ncbiapp.cpp", line 700: ncbi::CNcbiApplicationAPI::x_TryMain() --- Application's execution failed (CException::eUnknown) CInputStreamSource: File is not accessible: tmpxnhuwi9_
  00090/000/0000/PE 1636005A6330F1C1 0005/0005 2024-05-02T03:57:16.827458 6b0fbe659621    UNK_CLIENT      UNK_SESSION              rnaseq_collapse stop          3 0.042212009

Work dir:
  /home/scwang2023/NGStools/egapx/temp_datapath/D_farinae/12/add11c3c8cc7ae9f346f8a8819fd07

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

I run this test run at a local machine, which has 96-cores and 1.5T RAM. Computational resources should be sufficient, as described in the 'Prerequisites' section. I am wondering how to solve this problem. Thank you!

victzh commented 5 months ago

That's a problem fixed in our current code. Can you apply the patch sra_metadata_tmp.patch and try again?

xo2003 commented 5 months ago

Thanks for the prompt reply!

Unfortunately, I couldn't apply the patch file to the egapx repository. When I tried using git apply, I encountered the following error:

error: patch failed: ui/egapx.py:291
error: ui/egapx.py: patch does not apply

Is there anything I missed?

victzh commented 5 months ago

Don't use git to apply patch. In egapx directory use patch -p1 < /path/to/patch/sra_metadata_tmp.patch

xo2003 commented 5 months ago

After applying the patch, the test run has been successfully completed. Thanks!

victzh commented 5 months ago

Glad it helped.