Closed rfcohen closed 3 years ago
Hi, Rob, always nice to hear from you. Did George contact you with the latest? Is this getting resolved?
I contacted George, looks like the information here is at the current status.
Now, on the subject. Initially we had all output from a CommandLineTool step (i.e. our Docker binary) directed straight into cwltool.log
. Due to diverse nature of our 150-200 binaries some of them produce quite voluminous output. That was one of the reasons we redirected the output into individual worker subdirectories for each such node, into ncbiapp.log
.
The solution for users is to rerun the pgap with -debug
option to preserve these subdirectories and look for the output in these invidiual ncbiapp.log
files.
Would you mind doing that?
For this particular purpose ("give me the diagnostics!") --ignore-all-errors
is of no use either for the same reason.
Hi Azat,
Thanks. I’ll run with this option later today and send you the output from vecscreen. Where do I find the ncbiapp.log files?
-Rob
On Feb 12, 2021, at 7:37 AM, Azat Badretdin notifications@github.com wrote:
I contacted George, looks like the information here is at the current status.
Now, on the subject. Initially we had all output from a CommandLineTool step (i.e. our Docker binary) directed straight into cwltool.log. Due to diverse nature of our 150-200 binaries some of them produce quite voluminous output. That was one of the reasons we redirected the output into individual worker subdirectories for each such node, into ncbiapp.log.
The solution for users is to rerun the pgap with -debug option to preserve these subdirectories and look for the output in these invidiual ncbiapp.log files.
Would you mind doing that?
For this particular purpose ("give me the diagnostics!") --ignore-all-errors is of no use either for the same reason.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ncbi/pgap/issues/125#issuecomment-778171257, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALBS72SAHHQS4VG4LN34MLS6UOKLANCNFSM4XNRFBOQ.
Where do I find the ncbiapp.log files
-debug option will give you something like debug-extra. cwltool.log
will give you a line like path> "binary application"
. the last two parts of path is the path under debug-extra. It could be called ncbi.log as well, I do not remember exactly. Once you reached the directory, it will be clear.
Hope this helps, Rob.
Well...I ran with the --debug turned on and ignore-all-errors turned off.
I now have a directories:
debug: log: (this directory is empty) tmp-outdir: 25 directories with names like: 2y3tqxcq, 21avowft, etc. tmpdir: 17 directories with names like: bz1afa9c, 54fxwylf, etc.
There's nothing here that's obvious. I'm gonna need more clues.
Your run should have non-epty output/calls.tab
file, as indicated here:
"calls": {
"location": "file:///pgap/output/calls.tab",
"basename": "calls.tab",
"class": "File",
"checksum": "sha1$9606ff835d1c4d5276e9bbad8386ba172ce0c9e9",
"size": 68,
"path": "/pgap/output/calls.tab"
},
it contains references to contaminated pieces.
Hi Azat-
The calls.tab file has exactly one entry in it:
lcl|Scaffold_1 M 4067647..4067673 adaptor:NGB00839.1 Adaptor
Doesn’t seem too useful.
What’s your availability Monday to do some screen sharing?
-Rob
On Feb 12, 2021, at 6:19 PM, Azat Badretdin notifications@github.com wrote:
Your run should have non-epty output/calls.tab file, as indicated here:
"calls": { "location": "file:///pgap/output/calls.tab", "basename": "calls.tab", "class": "File", "checksum": "sha1$9606ff835d1c4d5276e9bbad8386ba172ce0c9e9", "size": 68, "path": "/pgap/output/calls.tab" }, it contains references to contaminated pieces.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ncbi/pgap/issues/125#issuecomment-778508613, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALBS76ERPD45XZCJ373FX3S6WZQ5ANCNFSM4XNRFBOQ.
Hi Rob - Glad to hear that the output directory is not empty. The line in calls.tab indicates that spans 4067647..4067673 in sequence Scaffold_1 is suspected of being an adaptor sequence (NGB00839.1). See the description of this file in on our wiki. Your choices are to clean this span in your input fasta (most likely by replacing with Ns), or to run pgap with the --ignore-all-errors
flag. In your first post, you wrote pgap runs with this option 'but produces no output.' Do you means the output directory is empty, or it contains files but no annotation? Running with both --ignore-all-errors
and --debug
is the next thing to try. Thanks!
Hi Françoise -
When I ran with ignore-all-errors it did produce output but no annotation. Again the calls.tab file has only one entry in it - an adaptor.
There are other files in there - calls.tab, cwltool.log, initial_asndisc_diag.xml (empty), initial_asnval_diag.xml (empty).
I’ll try it with -ignore-all-errors and —-debug.
Thanks.
-Rob
On Feb 14, 2021, at 10:46 AM, Francoise Thibaud-Nissen notifications@github.com wrote:
Hi Rob - Glad to hear that the output directory is not empty. The line in calls.tab indicates that spans 4067647..4067673 in sequence Scaffold_1 is suspected of being an adaptor sequence (NGB00839.1). See the description of this file in on our wiki https://github.com/ncbi/pgap/wiki/Output-Files. Your choices are to clean this span in your input fasta (most likely by replacing with Ns), or to run pgap with the --ignore-all-errors flag. In your first post, you wrote pgap runs with this option 'but produces no output.' Do you means the output directory is empty, or it contains files but no annotation? Running with both --ignore-all-errors and --debug is the next thing to try. Thanks!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ncbi/pgap/issues/125#issuecomment-778795921, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALBS727NTGVKE52YQNNTL3S67V37ANCNFSM4XNRFBOQ.
Hello NCBI friends-
Well I ran with --ignore-all-errors and —debug and it still failed with PermanentFail
Although it looks like it’s in a different place now….
[2021-02-15 14:53:46] WARNING [step Find_Best_Evidence_Alignments] completed permanentFail [2021-02-15 14:53:46] DEBUG [job Find_Best_Evidence_Alignments] Removing input staging directory /pgap/output/debug/tmpdir/qzkfcyk2 [2021-02-15 14:53:46] INFO [workflow bacterial_annot_3] completed permanentFail
And in multiple places.
[2021-02-15 14:53:46] WARNING [step bacterial_annot_3] completed permanentFail [2021-02-15 14:53:46] INFO [workflow standard_pgap] completed permanentFail
The calls.tab fill still only has one entry - that adapter.
Also, the one genome I have the ran to completion and produced output (don't know if it's any good), also has the same singe adapter entry in the calls.tab file.
Not sure what the next step is/should be.
Thanks.
-Rob
On Feb 14, 2021, at 2:24 PM, Rob Cohen rcohen@mac.com wrote:
Hi Françoise -
When I ran with ignore-all-errors it did produce output but no annotation. Again the calls.tab file has only one entry in it - an adaptor.
There are other files in there - calls.tab, cwltool.log, initial_asndisc_diag.xml (empty), initial_asnval_diag.xml (empty).
I’ll try it with -ignore-all-errors and —-debug.
Thanks.
-Rob
On Feb 14, 2021, at 10:46 AM, Francoise Thibaud-Nissen <notifications@github.com mailto:notifications@github.com> wrote:
Hi Rob - Glad to hear that the output directory is not empty. The line in calls.tab indicates that spans 4067647..4067673 in sequence Scaffold_1 is suspected of being an adaptor sequence (NGB00839.1). See the description of this file in on our wiki https://github.com/ncbi/pgap/wiki/Output-Files. Your choices are to clean this span in your input fasta (most likely by replacing with Ns), or to run pgap with the --ignore-all-errors flag. In your first post, you wrote pgap runs with this option 'but produces no output.' Do you means the output directory is empty, or it contains files but no annotation? Running with both --ignore-all-errors and --debug is the next thing to try. Thanks!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ncbi/pgap/issues/125#issuecomment-778795921, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALBS727NTGVKE52YQNNTL3S67V37ANCNFSM4XNRFBOQ.
And in multiple places.
Right. It reports diligently all the onion layers in workflow hierarchy that failed.
[step Find_Best_Evidence_Alignments] completed permanentFail
That would be harder to detect without the actual input/output. Have you run it with --debug option? If you did you can locate the output directory under debug-extra/tmp-outdir. The actual tmp-outdir/xxxxxx directory relevant to you will be in CWL log file:
There will be a line:
..../debug/tmp-outdir/xxxxxx $ bact_best_evidence_alignments
-parameter1 value2 \
-parameter3 value4 \
That tmp-outdir/xxxx is your name In your cwltool.log. That's the directory you need with data. The ncbiapp.log might show you the actual error which you can disclosed to the public without sharing trade secrets.
As for call, sure, maybe Thursday?
Finally got this worked out and it was a couple of issues. However, the support from the PGAP team was great. Granted I have a long relationship with these people and they're really good at this. They want to put out a good product and deliver tools to advance the science the they're doing that. Keep up the great work and their support and availability and willingness to help is fantastic.
Thanks Rob. Using --ignore-all-errors
AND removing the contigs shorter than 200 bases allowed the process to finish successfully, correct? The thing to remember here is that --ignore-all-errors
doesn't skip over small contigs, and these may cause issues to PGAP, with or without the flag.
Also: --ignore-all-errors
: unless the error is so fundamental that we can't ignore it.
Good afternoon NCBI Friends.
I have an assembly made up of many contigs in the .fasta file. They are all greater than 200bp.
If dies in just a few seconds.
here are the sections from from the log files that show the failure (not very useful).
[2021-02-10 19:11:23] DEBUG [job screen_evaluate] initial work dir {} [2021-02-10 19:11:23] INFO [job screenevaluate] /tmp/oflyqnn$ screen_evaluate \ -ifmt \ seq-annot \ -tab \ /tmp/7ov3j4cm/stgdfcd985a-e840-482c-9cd0-3996d717cb57/calls.tab [2021-02-10 19:11:23] DEBUG Could not collect memory usage, job ended before monitoring began. [2021-02-10 19:11:23] WARNING [job screen_evaluate] completed permanentFail [2021-02-10 19:11:23] DEBUG [job screen_evaluate] outputs { "success": true } [2021-02-10 19:11:23] DEBUG [step screen_evaluate] produced output { "file:///pgap/pgap/vecscreen/vecscreen.cwl#screen_evaluate/success": true } [2021-02-10 19:11:23] WARNING [step screen_evaluate] completed permanentFail [2021-02-10 19:11:23] INFO [workflow vecscreen] completed permanentFail [2021-02-10 19:11:23] DEBUG [workflow vecscreen] outputs { "adaptor_blastdb_dir": { "location": "file:///tmp/pu7d9kpa/blastdir", "basename": "blastdir", "nameroot": "blastdir",
I've included the entire log file as an attachment. cwltool.log
I can run it with --ignore-all-errors and it completes, but produces no output.