Closed tarunaaggarwal closed 1 year ago
Thank you for your report, Taruna! It looks like you do not have internet connection, you might try it with /home/taruna/pkgs/pgap/pgap.py --no-internet
option added
Ah okay thank you. I tried to run it with --no-internet
but it fails again. I think it's failing because I don't have a specific taxa...the most specific I can get is at the class level. I got the following error today.
PGAP version 2022-12-13.build6494 is up to date.
Output will be placed in: /home/taruna/sargassum/metagenomes/clean_data/pgap/OA/3/results
WARNING: open files is less than the recommended value of 8000
PGAP failed, docker exited with rc = 1
Printing log starting from failed job:
[2023-04-19 15:22:54] DEBUG [step pgapx_yaml_ctl] job input {
"file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_ctl/ignore_all_errors": null,
"file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_ctl/input": {
"location": "file:///tmp/n37tqmrd/submol.json",
"basename": "submol.json",
"nameroot": "submol",
"nameext": ".json",
"class": "File",
"checksum": "sha1$671a0c2166cbe26d5313169b8ebfbee313b532d6",
"size": 1082,
"http://commonwl.org/cwltool#generation": 0
},
"file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_ctl/input_fasta": {
"class": "File",
"location": "file:///home/taruna/sargassum/metagenomes/clean_data/pgap/OA/3/with_coverage.3.fa",
"size": 3376319,
"basename": "with_coverage.3.fa",
"nameroot": "with_coverage.3",
"nameext": ".fa"
},
"file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_ctl/no_internet": null,
"file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_ctl/taxon_db": {
"class": "File",
"location": "file:///pgap/input/uniColl_path/taxonomy.sqlite3",
"basename": "taxonomy.sqlite3",
"size": 1083080704,
"nameroot": "taxonomy",
"nameext": ".sqlite3"
}
}
[2023-04-19 15:22:54] DEBUG [step pgapx_yaml_ctl] evaluated job input to {
"file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_ctl/ignore_all_errors": null,
"file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_ctl/input": {
"location": "file:///tmp/n37tqmrd/submol.json",
"basename": "submol.json",
"nameroot": "submol",
"nameext": ".json",
"class": "File",
"checksum": "sha1$671a0c2166cbe26d5313169b8ebfbee313b532d6",
"size": 1082,
"http://commonwl.org/cwltool#generation": 0
},
"file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_ctl/input_fasta": {
"class": "File",
"location": "file:///home/taruna/sargassum/metagenomes/clean_data/pgap/OA/3/with_coverage.3.fa",
"size": 3376319,
"basename": "with_coverage.3.fa",
"nameroot": "with_coverage.3",
"nameext": ".fa"
},
"file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_ctl/no_internet": null,
"file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_ctl/taxon_db": {
"class": "File",
"location": "file:///pgap/input/uniColl_path/taxonomy.sqlite3",
"basename": "taxonomy.sqlite3",
"size": 1083080704,
"nameroot": "taxonomy",
"nameext": ".sqlite3"
}
}
[2023-04-19 15:22:54] INFO [step pgapx_yaml_ctl] start
[2023-04-19 15:22:54] DEBUG [job pgapx_yaml_ctl] initializing from file:///pgap/pgap/progs/pgapx_yaml_ctl.cwl as part of step pgapx_yaml
_ctl
[2023-04-19 15:22:54] DEBUG [job pgapx_yaml_ctl] {
"ignore_all_errors": null,
"input": {
"location": "file:///tmp/n37tqmrd/submol.json",
"basename": "submol.json",
"nameroot": "submol",
"nameext": ".json",
"class": "File",
"checksum": "sha1$671a0c2166cbe26d5313169b8ebfbee313b532d6",
"size": 1082,
"http://commonwl.org/cwltool#generation": 0
},
"input_fasta": {
"class": "File",
"location": "file:///home/taruna/sargassum/metagenomes/clean_data/pgap/OA/3/with_coverage.3.fa",
"size": 3376319,
"basename": "with_coverage.3.fa",
"nameroot": "with_coverage.3",
"nameext": ".fa"
},
"no_internet": null,
"taxon_db": {
"class": "File",
"location": "file:///pgap/input/uniColl_path/taxonomy.sqlite3",
"basename": "taxonomy.sqlite3",
"size": 1083080704,
"nameroot": "taxonomy",
"nameext": ".sqlite3"
},
"ifmt": "JSON",
"ofmt": "JSON",
"output_annotation_name": "input.asn",
"output_input_asn_type_name": "input_asn_type.txt",
"output_ltp_name": "genome.ltp.txt",
"output_taxid_name": "taxid.txt"
}
[2023-04-19 15:22:54] DEBUG [job pgapx_yaml_ctl] path mappings is {
"file:///tmp/n37tqmrd/submol.json": [
"/tmp/n37tqmrd/submol.json",
"/tmp/gnkc2aez/stg57b95de1-faea-4142-90f6-b2e15e148801/submol.json",
"File",
true
],
"file:///home/taruna/sargassum/metagenomes/clean_data/pgap/OA/3/with_coverage.3.fa": [
"/home/taruna/sargassum/metagenomes/clean_data/pgap/OA/3/with_coverage.3.fa",
"/tmp/gnkc2aez/stg151904f6-e0f3-4a2d-9d65-b7f51a6cdc9b/with_coverage.3.fa",
"File",
true
],
"file:///pgap/input/uniColl_path/taxonomy.sqlite3": [
"/pgap/input/uniColl_path/taxonomy.sqlite3",
"/tmp/gnkc2aez/stg3c44c53a-a4da-4a25-a101-390a1b9af642/taxonomy.sqlite3",
"File",
true
]
}
[2023-04-19 15:22:54] DEBUG [job pgapx_yaml_ctl] command line bindings is [
{
"position": [
-1000000,
0
],
"datum": "pgapx_yaml_ctl"
},
{
"prefix": "-ifmt",
"position": [
0,
"ifmt"
],
"datum": "JSON"
},
{
"prefix": "-input",
"position": [
0,
"input"
],
"datum": {
"location": "file:///tmp/n37tqmrd/submol.json",
"basename": "submol.json",
"nameroot": "submol",
"nameext": ".json",
"class": "File",
"checksum": "sha1$671a0c2166cbe26d5313169b8ebfbee313b532d6",
"size": 1082,
"http://commonwl.org/cwltool#generation": 0,
"path": "/tmp/gnkc2aez/stg57b95de1-faea-4142-90f6-b2e15e148801/submol.json",
"dirname": "/tmp/gnkc2aez/stg57b95de1-faea-4142-90f6-b2e15e148801"
}
},
{
"prefix": "-input-fasta",
"position": [
0,
"input_fasta"
],
"datum": {
"class": "File",
"location": "file:///home/taruna/sargassum/metagenomes/clean_data/pgap/OA/3/with_coverage.3.fa",
"size": 3376319,
"basename": "with_coverage.3.fa",
"nameroot": "with_coverage.3",
"nameext": ".fa",
"path": "/tmp/gnkc2aez/stg151904f6-e0f3-4a2d-9d65-b7f51a6cdc9b/with_coverage.3.fa",
"dirname": "/tmp/gnkc2aez/stg151904f6-e0f3-4a2d-9d65-b7f51a6cdc9b"
}
},
{
"prefix": "-ofmt",
"position": [
0,
"ofmt"
],
"datum": "JSON"
},
{
"prefix": "-output-annotation",
"position": [
0,
"output_annotation_name"
],
"datum": "input.asn"
},
{
"prefix": "-output-asn-type",
"position": [
0,
"output_input_asn_type_name"
],
"datum": "input_asn_type.txt"
},
{
"prefix": "-output-ltp",
"position": [
0,
"output_ltp_name"
],
"datum": "genome.ltp.txt"
},
{
"prefix": "-output-taxid",
"position": [
0,
"output_taxid_name"
],
"datum": "taxid.txt"
},
{
"prefix": "-taxon-db",
"position": [
0,
"taxon_db"
],
"datum": {
"class": "File",
"location": "file:///pgap/input/uniColl_path/taxonomy.sqlite3",
"basename": "taxonomy.sqlite3",
"size": 1083080704,
"nameroot": "taxonomy",
"nameext": ".sqlite3",
"path": "/tmp/gnkc2aez/stg3c44c53a-a4da-4a25-a101-390a1b9af642/taxonomy.sqlite3",
"dirname": "/tmp/gnkc2aez/stg3c44c53a-a4da-4a25-a101-390a1b9af642"
}
}
]
[2023-04-19 15:22:54] DEBUG [job pgapx_yaml_ctl] initial work dir {}
[2023-04-19 15:22:54] INFO [job pgapx_yaml_ctl] /tmp/pi0nsvfh$ pgapx_yaml_ctl \
-ifmt \
JSON \
-input \
/tmp/gnkc2aez/stg57b95de1-faea-4142-90f6-b2e15e148801/submol.json \
-input-fasta \
/tmp/gnkc2aez/stg151904f6-e0f3-4a2d-9d65-b7f51a6cdc9b/with_coverage.3.fa \
-ofmt \
JSON \
-output-annotation \
input.asn \
-output-asn-type \
input_asn_type.txt \
-output-ltp \
genome.ltp.txt \
-output-taxid \
taxid.txt \
-taxon-db \
/tmp/gnkc2aez/stg3c44c53a-a4da-4a25-a101-390a1b9af642/taxonomy.sqlite3
ignoring failure of pgapx_yaml_input.GetFasta
terminate called after throwing an instance of 'ncbi::CException'
what(): NCBI C++ Exception:
Error: CWL(CException::eUnknown) "/export/home/gpipe/TeamCity/Agent2/work/427aceaa834ecbb6/ncbi_cxx/src/internal/gpipe/app/cloud/cwl
/pgapx_yaml_ctl.cpp", line 246: CPgapxYamlCtlApplication::Run() --- Unknown organism Gammaproteobacteria OAbin3
Stack trace:
/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-12-13.build6494/arch/x86_64/bin/pgapx_yaml_ctl :0 offset=
0x0 addr=0x41a83e
/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-12-13.build6494/arch/x86_64/lib/libxncbi.so /export/home/g
pipe/TeamCity/Agent2/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp:702 ncbi::CNcbiApplicationAPI::x_TryMain(ncbi::EAppDiagStrea
m, char const*, int*, bool*) offset=0x0 addr=0x7fbe173d7472
/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-12-13.build6494/arch/x86_64/lib/libxncbi.so /export/home/g
pipe/TeamCity/Agent2/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp:1014 ncbi::CNcbiApplicationAPI::AppMain(int, char const* con
st*, char const* const*, ncbi::EAppDiagStream, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char
> > const&) offset=0x0 addr=0x7fbe173daa3c
/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-12-13.build6494/arch/x86_64/bin/pgapx_yaml_ctl :0 offset=
0x0 addr=0x40ca49
/usr/lib64/libc-2.17.so :0 offset=0x0 addr=0x7fbe15e03554
/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-12-13.build6494/arch/x86_64/bin/pgapx_yaml_ctl :0 offset=
0x0 addr=0x40cc19
:0 offset=0x0 addr=0xffffffffffffffff
Stack trace (most recent call last):
#12 Object "", at 0xffffffffffffffff, in
#11 Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-12-13.build6494/arch/x86_64/bin/pgapx_yaml_ctl", a
t 0x40cc19, in _start
#10 Object "/usr/lib64/libc-2.17.so", at 0x7fbe15e03554, in __libc_start_main
#9 Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-12-13.build6494/arch/x86_64/bin/pgapx_yaml_ctl", a
t 0x40ca49, in main
#8 Source "/export/home/gpipe/TeamCity/Agent2/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp", line 1014, in AppMain [0x7fbe1
73daa3c]
#7 Source "/export/home/gpipe/TeamCity/Agent2/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp", line 702, in x_TryMain [0x7fbe
173d7472]
#6 Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-12-13.build6494/arch/x86_64/bin/pgapx_yaml_ctl", a
t 0x41a8ae, in CPgapxYamlCtlApplication::Run()
#5 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/eh_throw.cc", line 93, in __cxa_throw [0x7fbe16985222]
#4 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc", line 57, in terminate [0x7fbe16984fe0]
#3 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc", line 47, in __cxa_begin_catch [0x7fbe16984f95]
#2 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/vterminate.cc", line 95, in __verbose_terminate_handler [0x7fbe169871a4]
#1 Object "/usr/lib64/libc-2.17.so", at 0x7fbe15e18a77, in abort
#0 Object "/usr/lib64/libc-2.17.so", at 0x7fbe15e17387, in raise
Aborted (Signal sent by tkill() 94298 2660)
[2023-04-19 15:23:41] INFO [job pgapx_yaml_ctl] Max memory used: 0MiB
[2023-04-19 15:23:41] WARNING [job pgapx_yaml_ctl] was terminated by signal: SIGABRT
[2023-04-19 15:23:41] ERROR [job pgapx_yaml_ctl] Job error:
("Error collecting output for parameter 'input_asn_type': pgap/progs/pgapx_yaml_ctl.cwl:75:13: Did not find output file with glob patter
n: '['input_asn_type.txt']'.", {})
[2023-04-19 15:23:41] WARNING [job pgapx_yaml_ctl] completed permanentFail
[2023-04-19 15:23:41] DEBUG [job pgapx_yaml_ctl] outputs {}
[2023-04-19 15:23:41] ERROR [step pgapx_yaml_ctl] Output is missing expected field file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_
ctl/output_annotation
[2023-04-19 15:23:41] ERROR [step pgapx_yaml_ctl] Output is missing expected field file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_
ctl/output_ltp
[2023-04-19 15:23:41] ERROR [step pgapx_yaml_ctl] Output is missing expected field file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_
ctl/input_asn_type
[2023-04-19 15:23:41] ERROR [step pgapx_yaml_ctl] Output is missing expected field file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_
ctl/taxid
[2023-04-19 15:23:41] DEBUG [step pgapx_yaml_ctl] produced output {}
[2023-04-19 15:23:41] WARNING [step pgapx_yaml_ctl] completed permanentFail
[2023-04-19 15:23:41] DEBUG [job pgapx_yaml_ctl] Removing input staging directory /tmp/gnkc2aez
[2023-04-19 15:23:41] DEBUG [job pgapx_yaml_ctl] Removing temporary directory /tmp/fr5lmrk_
[2023-04-19 15:23:41] INFO [workflow prepare_input_template] completed permanentFail
[2023-04-19 15:23:41] DEBUG [workflow prepare_input_template] outputs {
"input_asn_type": null,
"locus_tag_prefix": null,
"output_entries": null,
"output_seq_submit": null,
"submol_block_json": {
"location": "file:///tmp/n37tqmrd/submol.json",
"basename": "submol.json",
"nameroot": "submol",
"nameext": ".json",
"class": "File",
"checksum": "sha1$671a0c2166cbe26d5313169b8ebfbee313b532d6",
"size": 1082,
"http://commonwl.org/cwltool#generation": 0
},
"taxid": null
}
[2023-04-19 15:23:41] DEBUG [step prepare_input_template] produced output {
"file:///pgap/pgap/pgap.cwl#prepare_input_template/output_seq_submit": null,
"file:///pgap/pgap/pgap.cwl#prepare_input_template/output_entries": null,
"file:///pgap/pgap/pgap.cwl#prepare_input_template/locus_tag_prefix": null,
"file:///pgap/pgap/pgap.cwl#prepare_input_template/submol_block_json": {
"location": "file:///tmp/n37tqmrd/submol.json",
"basename": "submol.json",
"nameroot": "submol",
"nameext": ".json",
"class": "File",
"checksum": "sha1$671a0c2166cbe26d5313169b8ebfbee313b532d6",
"size": 1082,
"http://commonwl.org/cwltool#generation": 0
},
"file:///pgap/pgap/pgap.cwl#prepare_input_template/taxid": null
}
[2023-04-19 15:23:41] WARNING [step prepare_input_template] completed permanentFail
[2023-04-19 15:23:41] INFO [workflow ] completed permanentFail
[2023-04-19 15:23:41] DEBUG [workflow ] outputs {
"calls": null,
"cds_nucleotide_fasta": null,
"cds_protein_fasta": null,
"checkm_raw": null,
"final_asndisc_error_diag": null,
"final_asnval_error_diag": null,
"gbk": null,
"gff": null,
"gff_enhanced": null,
"initial_asndisc_error_diag": null,
"initial_asnval_error_diag": null,
"input_fasta": {
"class": "File",
"location": "file:///home/taruna/sargassum/metagenomes/clean_data/pgap/OA/3/with_coverage.3.fa",
"size": 3376319,
"basename": "with_coverage.3.fa",
"nameroot": "with_coverage.3",
"nameext": ".fa"
},
"input_submol": {
"class": "File",
"location": "file:///pgap/user_input/pgap_submol_9g_om9e1.yaml",
"size": 1211,
"basename": "pgap_submol_9g_om9e1.yaml",
"nameroot": "pgap_submol_9g_om9e1",
"nameext": ".yaml"
},
"nucleotide_fasta": null,
"protein_fasta": null,
"sqn": null
}
[2023-04-19 15:23:41] DEBUG Copying /home/taruna/sargassum/metagenomes/clean_data/pgap/OA/3/with_coverage.3.fa to /pgap/output/with_cove
rage.3.fa
[2023-04-19 15:23:41] DEBUG Copying /pgap/user_input/pgap_submol_9g_om9e1.yaml to /pgap/output/pgap_submol_9g_om9e1.yaml
[2023-04-19 15:23:41] DEBUG Removing intermediate output directory /tmp/01mhm4hz
[2023-04-19 15:23:41] DEBUG Removing intermediate output directory /tmp/pi0nsvfh
[2023-04-19 15:23:41] DEBUG Removing intermediate output directory /tmp/n37tqmrd
[2023-04-19 15:23:41] DEBUG Removing intermediate output directory /tmp/i_hn51ln
[2023-04-19 15:23:41] DEBUG Removing intermediate output directory /tmp/cqxw_ubv
{
"calls": null,
"cds_nucleotide_fasta": null,
"cds_protein_fasta": null,
"checkm_raw": null,
"final_asndisc_error_diag": null,
"final_asnval_error_diag": null,
"gbk": null,
"gff": null,
"gff_enhanced": null,
"initial_asndisc_error_diag": null,
"initial_asnval_error_diag": null,
"input_fasta": {
"class": "File",
"location": "file:///pgap/output/with_coverage.3.fa",
"size": 3376319,
"basename": "with_coverage.3.fa",
"checksum": "sha1$68213496aa7ca8f53bbc94c986b01f02bf6079b8",
"path": "/pgap/output/with_coverage.3.fa"
},
"input_submol": {
"class": "File",
"location": "file:///pgap/output/pgap_submol_9g_om9e1.yaml",
"size": 1211,
"basename": "pgap_submol_9g_om9e1.yaml",
"checksum": "sha1$f29598be95c9f12e597829d6c915bf36d473b2e8",
"path": "/pgap/output/pgap_submol_9g_om9e1.yaml"
},
"nucleotide_fasta": null,
"protein_fasta": null,
"sqn": null
}
[2023-04-19 15:23:41] WARNING Final process status is permanentFail
I think it's failing because I don't have a specific taxa
Correct!
Gammaproteobacteria OAbin3
That's a strange looking taxname...
Hahaha, strange looking indeed! The OAbin3
is an identifier that I added. I'm working with MAGs so I don't have much taxonomic resolution. Does this mean I can't use PGAP for annotating my MAGs?
It needs to be a registered taxonomy
Gammaproteobacteria is in NCBI taxonomy list but it's just not specific enough for PGAP. I will try to blast marker genes from this MAG and see if I can get a more specific taxonomy. Thank you!
It needs to be a genus or below.
Hello,
I installed PGAP and ran it on the test genomes successfully. But now I'm having Python-related issues when trying to run PGAP on my own data using a script. Might you please help me resolve the errors below? Thank you!
Contents of my script: