ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
310 stars 90 forks source link

WARNING Final process status is permanentFail #214

Closed tbazilegith closed 2 years ago

tbazilegith commented 2 years ago

I have pgap installed as a module module load pgap/20220414 pgap.py -h usage: pgap.py [-h] [-V] [-v] [--prod] [--taxcheck | --taxcheck-only] [-l | -u] [-r | -n] [--container-name CONTAINER_NAME] [--container-path CONTAINER_PATH] [--ignore-all-errors] [--no-internet] [--auto-correct-tax] [-D path] [-o path] [-q] [--no-self-update] [-c CPUS] [-m MEMORY] [-d] [input]

After I ran the test like this pgap.py --cpus $SLURM_CPUS_ON_NODE \ -n -o pgap_testDir/test_output \ --container-path /apps/staphb-toolkit/containers/pgap_2022-04-14.build6021.sif \ --docker singularity \ pgap_testDir/MG37_input/input.yaml

The cwltool.log shows this Original command: /apps/pgap/20220414/pgap/scripts/pgap.py --docker /apps/singularity/latest/singularity --container-path /apps/pgap/20220414/pgap/scripts/pgap_2022-04-14.build6021.sif --no-self-update --report-usage-false --cpus 16 -n -o pgap_testDir/test_output --container-path /apps/staphb-toolkit/containers/pgap_2022-04-14.build6021.sif --docker singularity pgap_testDir/MG37_input/input.yaml [2022-07-19 09:10:57] WARNING Final process status is permanentFail

Could anybody help figure out what's wrong in command? Thanks! TJ

azat-badretdin commented 2 years ago

Thanks, TJ!

I see that it fails immediately after the execution and the parameters are definitely duplicated, which does not seem right (not necessarily matter in this case)

Just wanted to verify that you did not have the portion of cwltool.log that looks like this:


--- Start Runtime Report ---
{
    "CPU cores": 4,
    "Docker image": "ncbi/pgap-test:2022-07-18.build6234",
    "cpu flags": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke",
    "cpu model": "Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz",
    "max user processes": "unlimited",
    "memory (GiB)": 31.0,
    "memory per CPU core (GiB)": 7.8,
    "open files": 32768,
    "tmp disk space (GiB)": 235.3,
    "virtual memory": "unlimited",
    "work disk space (GiB)": 235.3
}
--- End Runtime Report ---
azat-badretdin commented 2 years ago

Could you please also post the result of /bin/ls -laR while in the directory you have been running this command?

tbazilegith commented 2 years ago

Hi Azat, --- Start Runtime Report --- { "CPU cores": 64, "Docker image": "/apps/staphb-toolkit/containers/pgap_2022-04-14.build6021.sif", "cpu flags": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl xtopology nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb cat_l3 cdp_l3 invpcid_single hw_pstate sme retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq overflow_recov succor smca", "cpu model": "AMD EPYC 75F3 32-Core Processor", "max user processes": 2057544, "memory (GiB)": 502.5, "memory per CPU core (GiB)": 7.9, "open files": 131072, "tmp disk space (GiB)": 7.4, "virtual memory": "unlimited", "work disk space (GiB)": 3869634.7 } --- End Runtime Report ---

The second request: /bin/ls -laR ./pgap_testDir/test_output.3: total 1976 drwxr-sr-x 2 username bphl-state 4096 Jul 19 09:10 . drwxr-sr-x 11 username bphl-state 4096 Jul 19 10:13 .. -rw-r--r-- 1 username bphl-state 588460 Jul 14 14:15 ASM2732v1.annotation.nucleotide.1.fasta -rw-r--r-- 1 username bphl-state 0 Jul 19 09:10 calls.tab -rw-r--r-- 1 username bphl-state 1418150 Jul 19 09:10 cwltool.log -rw------- 1 username bphl-state 1500 Jul 19 09:07 pgap_submol_qxg5zfgu.yaml

Thanks

azat-badretdin commented 2 years ago

Thanks, I see that cwltool.log has way more data thhan you posted, could you please attach the whole file? Thanks!

tbazilegith commented 2 years ago

Sorry, I wanted to show what I thought essential.

azat-badretdin commented 2 years ago

Understood. The essential part is a bit tricky. Sure PermanentFail line is important. But the essential details are in the "error" messages preceding the first occurance of PermenentFail in the log.

tbazilegith commented 2 years ago

working on that

tbazilegith commented 2 years ago

Hello Azat, I attached the cwltool.log file as text file. cwltool.txt

azat-badretdin commented 2 years ago
ncbi::CObjectIStreamJson::UnexpectedMember() --- line 1: "taxon": unexpected member, should be one of: "strain" "genus_species"  ( at JsonValue.organism)

Something wrong with your input YAML file, could you please attach it? (please do not copy paste it, quite often copy-paste into github comments screws up indentation which is paramount for YAML input).

BTW, I do not understand why this file pgap_testDir/MG37_input/input.yaml is not seen in the output of ls -alR I have requested.

Did you edit it out?

tbazilegith commented 2 years ago

No I didn't edit it. Here it is . I have also attached the submol file. input-yaml.txt submol-yaml.txt Thanks

azat-badretdin commented 2 years ago

Thanks, your submol file seems very outdated, could you please review the Quick Start/Input Files section of Wiki?

tbazilegith commented 2 years ago

I got the submol updated with the content on the Quick Start/Input Files, and I ran this on the terminal

./pgap.py -n -o mg37_results pgap_testDir/MG37_input/input.yaml

While still running, here is the triggered message WARN[0000] "/run/user/6210" directory set by $XDG_RUNTIME_DIR does not exist. Either create the directory or unset $XDG_RUNTIME_DIR.: stat /run/user/6210: no such file or directory: Trying to pull image in the event that it is a public image. WARNING: tmp disk space (GiB) is less than the recommended value of 10 PGAP failed, docker exited with rc = 1 Unable to find error in log file.

The process stopped with this error WARNING Final process status is permanentFail

The cwtool.log shows this --- Start Runtime Report --- { "CPU cores": 64, "Docker image": "ncbi/pgap:2022-04-14.build6021", "cpu flags": "fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl xtopology nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb cat_l3 cdp_l3 invpcid_single hw_pstate sme retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq overflow_recov succor smca", "cpu model": "AMD EPYC 75F3 32-Core Processor", "max user processes": 2057511, "memory (GiB)": 502.5, "memory per CPU core (GiB)": 7.9, "open files": 131072, "tmp disk space (GiB)": 7.3, "virtual memory": "unlimited", "work disk space (GiB)": 3868065.8 } --- End Runtime Report --- ........ WARNING Final process status is permanentFai

updated submol submol-yaml.txt

Thanks

azat-badretdin commented 2 years ago

Thanks!

This

WARN[0000] "/run/user/6210" directory set by $XDG_RUNTIME_DIR does not exist. Either create the directory or unset $XDG_RUNTIME_DIR.: stat /run/user/6210: no such file or directory: Trying to pull image in the event that it is a public image.

Looks like something on your side, please consult your local system folks to figure this out.

This

WARNING: tmp disk space (GiB) is less than the recommended value of 10

Also requires attention of a person who can bump the tmp disk space size for you. The tmp is the dir specified in "TMPDIR" envar or "/tmp" by default.

New submol input file you attached looks better taxonomy wise.

Could you please post the cwltool.log file again from your latest run?

Thanks

tbazilegith commented 2 years ago

Hello Azat, Sorry for the tardiness, the file took time to download. Had poor network connection. I attached the cwtool.log file as text file cwltool_log.txt

Thanks

azat-badretdin commented 2 years ago

Thanks, Tassy, this is becoming interesting:


Processing sequences...
  processing lcl|L43967.2
terminate called after throwing an instance of 'ncbi::CException'
  what():  NCBI C++ Exception:
    Error: ASN_PROC(CException::eUnknown) "/export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/src/internal/gpipe/asn_proc/markup_tool.cpp", line 168: ncbi::objects::CMarkupTool::x_FixDBLink() --- Invalid project 'PRJ9999999'
     Stack trace:

The first idea that comes to mind is that your setup includes implicit internet connection blackout that reveals that we are attempting to connect to our central resources for this...

We will explore our code for this...

azat-badretdin commented 2 years ago

Tassy, it looks like the problem is much easier to fix: please ditch fake biosample and bioproject lines from your submol.txt