ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
316 stars 88 forks source link

Running PGAP without the wrapper script #208

Closed npcooley closed 2 years ago

npcooley commented 2 years ago

Hi,

We're trying to run PGAP without the wrapper script because that fits our compute scheme and I'm unable to move past some errors that we're encountering:

We spin up a docker container as normal: docker run -i -t --rm --volume /Users/nicholascooley/input-2022-04-14.build6021.tgz:/pgap/input-2022-04-14.build6021.tgz --volume /Users/nicholascooley/submol_MEGAHIT000001.yaml:/pgap/submol_MEGAHIT000001.yaml --volume /Users/nicholascooley/Controller.yaml:/pgap/Controller.yaml --volume /Users/nicholascooley/MEGAHIT000001.fna:/pgap/MEGAHIT000001.fna ncbi/pgap:2022-04-14.build6021

Untar our input data: tar xzvf /pgap/input-2022-04-14.build6021.tgz -C /pgap/input --strip-components 1

and call cwltool: cwltool --timestamps --debug --disable-color --preserve-entire-environment --outdir /pgap/output /pgap/pgap/pgap.cwl /pgap/Controller.yaml

our yaml files are: Controller

fasta:
  class: File
  location: /pgap/MEGAHIT000001.fna
submol:
  class: File
  location: /pgap/submol_MEGAHIT000001.yaml
supplemental_data: { class: Directory, location: /pgap/input }
report_usage: true
ignore_all_errors: true

Submol:

topology: 'linear'
organism:
  genus_species: 'Metamycoplasma hominis'
authors:
- author:
    last_name: 'Cooley'
    first_name: 'Nicholas'
    middle_initial: 'P'

what seems like the relevant of CWLTool's output: [2022-06-07 17:00:18] DEBUG [job pgapx_yaml_ctl] initial work dir {} [2022-06-07 17:00:18] INFO [job pgapx_yaml_ctl] /tmp/wpvacsny$ pgapx_yaml_ctl \ -ifmt \ JSON \ -ignore-all-errors \ -input \ /tmp/c44venwr/stg1370c1a6-bed6-4ca7-9d98-c9cadd42566d/submol.json \ -input-fasta \ /tmp/c44venwr/stg1c7efd95-3d89-4aa5-937a-faf5f7904c58/MEGAHIT000001.fna \ -ofmt \ JSON \ -output-annotation \ input.asn \ -output-asn-type \ input_asn_type.txt \ -output-ltp \ genome.ltp.txt \ -output-taxid \ taxid.txt \ -taxon-db \ /tmp/c44venwr/stg3c47d88a-5db0-478c-bd74-32ad017b545f/taxonomy.sqlite3 terminate called after throwing an instance of 'ncbi::CSerialException' what(): NCBI C++ Exception: Error: SERIAL(CSerialException::eFormatError) "/export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/src/serial/objistr.cpp", line 1018: ncbi::CObjectIStream::ExpectedMember() --- line 1: member contact_info expected ( at JsonValue) Stack trace: /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-04-14.build6021/arch/x86_64/lib/libtaxon1.so /export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/include/serial/exception.hpp:67 ncbi::CSerialException::CSerialException(ncbi::CDiagCompileInfo const&, ncbi::CException const, ncbi::CSerialException::EErrCode, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, ncbi::EDiagSev) offset=0x0 addr=0x7f500b55d115 /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-04-14.build6021/arch/x86_64/lib/libxser.so /export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/src/serial/objistr.cpp:832 ncbi::CObjectIStream::ThrowError1(ncbi::CDiagCompileInfo const&, int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) offset=0x0 addr=0x7f500336e5a3 /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-04-14.build6021/arch/x86_64/lib/libxser.so /export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/src/serial/objistr.cpp:1017 ncbi::CObjectIStream::ExpectedMember(ncbi::CMemberInfo const) offset=0x0 addr=0x7f50033707df /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-04-14.build6021/arch/x86_64/lib/libxser.so /export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/src/serial/member.cpp:891 ncbi::CMemberInfoFunctions::ReadMissingSimpleMember(ncbi::CObjectIStream&, ncbi::CMemberInfo const, void) offset=0x0 addr=0x7f500335f4a0 /export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/src/serial/objistr.cpp:1372 ReadMissingMember offset=0x0 addr=(nil) /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-04-14.build6021/arch/x86_64/lib/libxser.so /export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/include/serial/impl/member.inl:111 ncbi::CObjectIStream::ReadClassRandom(ncbi::CClassTypeInfo const, void) offset=0x0 addr=0x7f5003370cba /export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/include/serial/impl/objistr.inl:95 ReadData offset=0x0 addr=(nil) /export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/src/serial/objistr.cpp:944 ReadObject offset=0x0 addr=(nil) /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-04-14.build6021/arch/x86_64/lib/libxser.so /export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/include/serial/impl/typeinfo.inl:69 ncbi::CObjectIStream::Read(void, ncbi::CTypeInfo const, ncbi::CObjectIStream::ENoFileHeader) offset=0x0 addr=0x7f5003377569 /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-04-14.build6021/arch/x86_64/lib/libxser.so /export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/src/serial/serialobject.cpp:781 ncbi::ReadObject(std::istream&, void, ncbi::CTypeInfo const) offset=0x0 addr=0x7f50033f1b85 /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-04-14.build6021/arch/x86_64/bin/pgapx_yaml_ctl :0 offset=0x0 addr=0x418a2c /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-04-14.build6021/arch/x86_64/lib/libxncbi.so /export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp:701 ncbi::CNcbiApplicationAPI::x_TryMain(ncbi::EAppDiagStream, char const, int, bool) offset=0x0 addr=0x7f5002930d92 /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-04-14.build6021/arch/x86_64/lib/libxncbi.so /export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp:1001 ncbi::CNcbiApplicationAPI::AppMain(int, char const const, char const const, ncbi::EAppDiagStream, char const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) offset=0x0 addr=0x7f50029343c4 /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-04-14.build6021/arch/x86_64/bin/pgapx_yaml_ctl :0 offset=0x0 addr=0x40d499 /usr/lib64/libc-2.17.so :0 offset=0x0 addr=0x7f5001144554 /panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-04-14.build6021/arch/x86_64/bin/pgapx_yaml_ctl :0 offset=0x0 addr=0x40d669 :0 offset=0x0 addr=0xffffffffffffffff

Stack trace (most recent call last):

14 Object "", at 0xffffffffffffffff, in

13 Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-04-14.build6021/arch/x86_64/bin/pgapx_yaml_ctl", at 0x40d669, in _start

12 Object "/usr/lib64/libc-2.17.so", at 0x7f5001144554, in __libc_start_main

11 Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-04-14.build6021/arch/x86_64/bin/pgapx_yaml_ctl", at 0x40d499, in main

10 Source "/export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp", line 1001, in AppMain [0x7f50029343c4]

9 Source "/export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/src/corelib/ncbiapp.cpp", line 701, in x_TryMain [0x7f5002930d92]

8 Object "/panfs/pan1.be-md.ncbi.nlm.nih.gov/gpipe/bacterial_pipeline/system/2022-04-14.build6021/arch/x86_64/bin/pgapx_yaml_ctl", at 0x418a2c, in CPgapxYamlCtlApplication::Run()

7 Source "/export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/src/serial/serialobject.cpp", line 781, in ReadObject [0x7f50033f1b85]

6 Source "/export/home/gpipe/TeamCity/Agent4/work/427aceaa834ecbb6/ncbi_cxx/src/serial/objistr.cpp", line 948, in Read [0x7f5003377712]

5 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/eh_throw.cc", line 131, in __cxa_rethrow [0x7f5001cc6275]

4 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc", line 57, in terminate [0x7f5001cc5fe0]

3 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/eh_terminate.cc", line 47, in __cxa_begin_catch [0x7f5001cc5f95]

2 Source "../../../../gcc-7.3.0/libstdc++-v3/libsupc++/vterminate.cc", line 95, in __verbose_terminate_handler [0x7f5001cc81a4]

1 Object "/usr/lib64/libc-2.17.so", at 0x7f5001159a77, in abort

0 Object "/usr/lib64/libc-2.17.so", at 0x7f5001158387, in raise

Aborted (Signal sent by tkill() 168 0) [2022-06-07 17:00:19] INFO [job pgapx_yaml_ctl] Max memory used: 39MiB [2022-06-07 17:00:19] WARNING [job pgapx_yaml_ctl] was terminated by signal: SIGABRT [2022-06-07 17:00:19] ERROR [job pgapx_yaml_ctl] Job error: ("Error collecting output for parameter 'input_asn_type': pgap/progs/pgapx_yaml_ctl.cwl:75:13: Did not find output file with glob pattern: '['input_asn_type.txt']'.", {}) [2022-06-07 17:00:19] WARNING [job pgapx_yaml_ctl] completed permanentFail [2022-06-07 17:00:19] DEBUG [job pgapx_yaml_ctl] outputs {} [2022-06-07 17:00:19] ERROR [step pgapx_yaml_ctl] Output is missing expected field file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_ctl/output_annotation [2022-06-07 17:00:19] ERROR [step pgapx_yaml_ctl] Output is missing expected field file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_ctl/output_ltp [2022-06-07 17:00:19] ERROR [step pgapx_yaml_ctl] Output is missing expected field file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_ctl/input_asn_type [2022-06-07 17:00:19] ERROR [step pgapx_yaml_ctl] Output is missing expected field file:///pgap/pgap/prepare_user_input2.cwl#pgapx_yaml_ctl/taxid [2022-06-07 17:00:19] DEBUG [step pgapx_yaml_ctl] produced output {} [2022-06-07 17:00:19] WARNING [step pgapx_yaml_ctl] completed permanentFail [2022-06-07 17:00:19] DEBUG [job pgapx_yaml_ctl] Removing input staging directory /tmp/c44venwr [2022-06-07 17:00:19] DEBUG [job pgapx_yaml_ctl] Removing temporary directory /tmp/efxutwx8 [2022-06-07 17:00:19] INFO [workflow prepare_input_template] completed permanentFail [2022-06-07 17:00:19] DEBUG [workflow prepare_input_template] outputs { "input_asn_type": null, "locus_tag_prefix": null, "output_entries": null, "output_seq_submit": null, "submol_block_json": { "location": "file:///tmp/_rpnvrnq/submol.json", "basename": "submol.json", "nameroot": "submol", "nameext": ".json", "class": "File", "checksum": "sha1$25771aafbf13ec7b8068a0e78d54496c6c158393", "size": 178, "http://commonwl.org/cwltool#generation": 0 }, "taxid": null } [2022-06-07 17:00:19] DEBUG [step prepare_input_template] produced output { "file:///pgap/pgap/pgap.cwl#prepare_input_template/output_seq_submit": null, "file:///pgap/pgap/pgap.cwl#prepare_input_template/output_entries": null, "file:///pgap/pgap/pgap.cwl#prepare_input_template/locus_tag_prefix": null, "file:///pgap/pgap/pgap.cwl#prepare_input_template/submol_block_json": { "location": "file:///tmp/_rpnvrnq/submol.json", "basename": "submol.json", "nameroot": "submol", "nameext": ".json", "class": "File", "checksum": "sha1$25771aafbf13ec7b8068a0e78d54496c6c158393", "size": 178, "http://commonwl.org/cwltool#generation": 0 }, "file:///pgap/pgap/pgap.cwl#prepare_input_template/taxid": null } [2022-06-07 17:00:19] WARNING [step prepare_input_template] completed permanentFail [2022-06-07 17:00:19] INFO [workflow ] completed permanentFail [2022-06-07 17:00:19] DEBUG [workflow ] outputs { "calls": null, "final_asndisc_error_diag": null, "final_asnval_error_diag": null, "gbk": null, "gff": null, "initial_asndisc_error_diag": null, "initial_asnval_error_diag": null, "input_fasta": { "class": "File", "location": "file:///pgap/MEGAHIT000001.fna", "size": 700681, "basename": "MEGAHIT000001.fna", "nameroot": "MEGAHIT000001", "nameext": ".fna" }, "input_submol": { "class": "File", "location": "file:///pgap/submol_MEGAHIT000001.yaml", "size": 165, "basename": "submol_MEGAHIT000001.yaml", "nameroot": "submol_MEGAHIT000001", "nameext": ".yaml" }, "nucleotide_fasta": null, "protein_fasta": null, "sqn": null } [2022-06-07 17:00:19] DEBUG Copying /pgap/MEGAHIT000001.fna to /pgap/output/MEGAHIT000001.fna [2022-06-07 17:00:19] DEBUG Copying /pgap/submol_MEGAHIT000001.yaml to /pgap/output/submol_MEGAHIT000001.yaml [2022-06-07 17:00:19] DEBUG Removing intermediate output directory /tmp/s7y1bj16 [2022-06-07 17:00:19] DEBUG Removing intermediate output directory /tmp/vtrztaun [2022-06-07 17:00:19] DEBUG Removing intermediate output directory /tmp/_rpnvrnq [2022-06-07 17:00:19] DEBUG Removing intermediate output directory /tmp/wpvacsny [2022-06-07 17:00:19] DEBUG Removing intermediate output directory /tmp/vwo_7udf { "calls": null, "final_asndisc_error_diag": null, "final_asnval_error_diag": null, "gbk": null, "gff": null, "initial_asndisc_error_diag": null, "initial_asnval_error_diag": null, "input_fasta": { "class": "File", "location": "file:///pgap/output/MEGAHIT000001.fna", "size": 700681, "basename": "MEGAHIT000001.fna", "checksum": "sha1$0948177d0ea86109cb9e5c8761b69719fc608002", "path": "/pgap/output/MEGAHIT000001.fna" }, "input_submol": { "class": "File", "location": "file:///pgap/output/submol_MEGAHIT000001.yaml", "size": 165, "basename": "submol_MEGAHIT000001.yaml", "checksum": "sha1$e884dc4092a29b5810d6ca298a817f4d5d6c40b6", "path": "/pgap/output/submol_MEGAHIT000001.yaml" }, "nucleotide_fasta": null, "protein_fasta": null, "sqn": null } [2022-06-07 17:00:19] WARNING Final process status is permanentFail

azat-badretdin commented 2 years ago

As it says:


member contact_info expected 

you are missing an obligatory JSON field. Please consult https://github.com/ncbi/pgap/wiki/Input-Files (Metadata YAML file (submol))