Closed marieleoz closed 2 years ago
Can I trust my results and use the files for further analyses?
At the first glance immediate answer would be "no", because of "permanentFail" status. But the presence of final results (flat files, for example) made me curious on what is going on in your case. Let me see...
And thank you for your report, Marie!
When looking at cwltool.log
with the status permanentFail the most informative message is the FIRST mention of permanentFail, which in your case is
[2021-12-07 19:34:31] DEBUG [job Final_Bacterial_Package_asndisc_evaluate] initial work dir {}
[2021-12-07 19:34:31] INFO [job Final_Bacterial_Package_asndisc_evaluate] /pgap/output/debug/tmp-outdir/dp8ym6_i$ xml_evaluate \
-input \
/pgap/output/debug/tmpdir/_zh81aht/stg21a8a5ce-02c2-40e0-8522-d49d88b5661a/annot.disc \
-xpath-fail \
'//*[@severity="FATAL"]' > /pgap/output/debug/tmp-outdir/dp8ym6_i/final_asndisc_diag.xml
Good news is that in terms of your SLURM environment PGAP "worked" in a sense that you got a meaningful result.
Bad news is that it has some problem diagnosed by our NCBI asndisc
tool. The key is the file final_asndisc_diag.xml
it should be either in the /output/
directory, or if not, it should be somewhere in /debug-extra/
directory. If you do not have this directory, try rerunning pgap.py
with --debug
option. If you are running your customized SLURM script, the call to pgap.py
I am guessing (since we are not responsible for this script) must be inside that script.
The key is the file final_asndisc_diag.xml
Feel free to post that file or examine messages under severity=FATAL XML element, to see if they are helpful to you to understand what is going on.
Hi Azat!
Thanks for your answer :) Here's what's in the final_asndisc_diag.xml file:
Failer nodes: <?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?>
Hopefully it makes more sense to you than to me :) Thanks!
Not sure you can actually see what I pasted so here's the content in a .txt file:
Thanks, Marie, this is useful:
Failer nodes:
<?xml version="1.0" encoding="UTF-8"?>
<test name="SHOW_HYPOTHETICAL_CDS_HAVING_GENE_NAME" description="Hypothetical CDS with gene names" severity="FATAL" cardinality="1">
<details message="## hypothetical coding regions have a gene name" severity="FATAL" cardinality="1" unit="hypothetical coding region" autofix="true">
<object type="feature" file="/pgap/output/debug/tmpdir/28968fcr/stgea061514-d09a-47fc-b556-40252de6b670/annot-wo-checksum.sqn" feature_type="CDS" product="IS66 family insertion sequence hypothetical protein" location="lcl|contig_00092:695-1084" locus_tag="pgaptmp_004933" label="CDS	IS66 family insertion sequence hypothetical protein	lcl|contig_00092:695-1084	pgaptmp_004933"/>
</details>
</test>
<?xml version="1.0" encoding="UTF-8"?>
<details message="## hypothetical coding regions have a gene name" severity="FATAL" cardinality="1" unit="hypothetical coding region" autofix="true">
<object type="feature" file="/pgap/output/debug/tmpdir/28968fcr/stgea061514-d09a-47fc-b556-40252de6b670/annot-wo-checksum.sqn" feature_type="CDS" product="IS66 family insertion sequence hypothetical protein" location="lcl|contig_00092:695-1084" locus_tag="pgaptmp_004933" label="CDS	IS66 family insertion sequence hypothetical protein	lcl|contig_00092:695-1084	pgaptmp_004933"/>
</details>
It looks to me that we already have enough material to start looking into this.
But before doing this I have noticed that you are still using July version of PGAPx. It is very possible that this particular evidence is gone now (we are double checking it ourselves as well)
Feel free to switch to a newer version in your script and try again. Besides this particular issue that has a chance to be resolved, there are other improvements that you might want. Also, generally, using the latest version is recommended.
Thanks Azat, I'll ask for the update, try again and let you know.
More evidence in support of the update: I just got a response from one of the curators of biological data indicating that this particular insertion sequence family name have been corrected in newer evidence sources.
Dear Azat,
Sorry it took me a little while to get this done, but I fear I got something alike with pgap_2021-11-29.build5742 I attach the cwltool.log (zipped) and final_asndisc_diag.xml file (renamed as .txt), but please let me know if there's anything else that could be helpful. cwltool.zip final_asndisc_diag.txt
Thanks a lot! Marie
Thanks, Marie. We will have a look at this.
Apologies for the long gap, Marie.
The message looks like
<?xml version="1.0" encoding="UTF-8"?>
<test name="SHOW_HYPOTHETICAL_CDS_HAVING_GENE_NAME" description="Hypothetical CDS with gene names" severity="FATAL" cardinality="1">
<details message="## hypothetical coding regions have a gene name" severity="FATAL" cardinality="1" unit="hypothetical coding region" autofix="true">
<object type="feature" file="/pgap/output/debug/tmpdir/y1xfp__8/stg03104f5c-13f0-48dc-a275-1070d0f8eed2/annot-wo-checksum.sqn" feature_type="CDS" product="IS66 family insertion sequence hypothetical protein" location="lcl|contig_00144:6485-6865" locus_tag="pgaptmp_004276" label="CDS	IS66 family insertion sequence hypothetical protein	lcl|contig_00144:6485-6865	pgaptmp_004276"/>
</details>
</test>
Which indicates to our extensive validators failing the output because of the shown error: IS66 family insertion sequence hypothetical protein
This issue has been resolved in the new release of PGAPx.
Please feel free to install it and try, Marie.
Hello,
I'm trying to use PGAP on a cluster as described here: http://bioinfo.genotoul.fr/index.php/how-to-use/?software=How_to_use_SLURM_PGAP
My submission script is based on theirs (I just renamed it as .txt because .sh files can't be attached): pgap-2021-07-01.build5508_MLE.txt
It looks like the run is successful because I get my .faa .fna .gbk .gff and .sqn annot files, but the cwltool.log file mentions lots of warnings and ends with: "Final process status is permanentFail" cwltool.zip
Can I trust my results and use the files for further analyses? Thanks a lot.
Best, Marie