ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
307 stars 90 forks source link

[BUG] WARNING Final process status is permanentFail #188

Closed dgutierrezcastillo closed 2 years ago

dgutierrezcastillo commented 2 years ago

Greetings,

I am not able to run PGAP with my assembled genomes. I ran it with the MG37 test genome and it ran successfully although it gave me the warning of having low space in the tmp outdir (1 Gb). However I set the outdir from the singularity cache in the sratch folder of our HPC so I should have more space than that. I can provide our draft genome if necessary.

Software versions

Thanks, Diego cwtool.log.zip

azat-badretdin commented 2 years ago

Thank you for your report, Diego.

The first permanentFail message indicate to the error output that says:


    Error: CWL(CException::eUnknown) "/export/home/gpipe/TeamCity/Agent3/work/427aceaa834ecbb6/ncbi_cxx/src/internal/gpipe/app/cloud/cwl/pgapx_yaml_ctl.cpp", line 246: CPgapxYamlCtlApplication::Run() --- Unknown organism Xanthomonas tranlucens

I suspect you wanted Xanthomonas translucens, with "s" between N and L.

dgutierrezcastillo commented 2 years ago

Thank you for your response Azat! I was able to run one of my genomes successfully after fixing the typo but the other one keeps giving me an error. It is a Xanthomonas translucens as well

cwltool (3).log .

azat-badretdin commented 2 years ago

although it gave me the warning of having low space in the tmp outdir (1 Gb).

The output still shows low tmp space. Could you please try to address that in your setup?

abdziz commented 2 years ago

Greetings, I am facing a similar problem. The test genome works but my query sequence reports permanent failure. I am annotating plasmids. cwltool.log

azat-badretdin commented 2 years ago

The report says: Unknown organism Baccillus thuringiensis

You made a typo in Baccillus. It's Bacillus.

dgutierrezcastillo commented 2 years ago

Hi Azat,

Thank you for your response. I have tried different combinations with the slurm nodes, ntasks, memory allocated but the program still gives me the same output of having low tmp disk even though I should have more than 6,4TB in the scratch directory where my singularity cache dir is set to.

Diego

abdziz commented 2 years ago

Thank you for the response Azat! I fixed the name but it still failed. I am using 16gb RAM and 8 cpu cores computer. I used --cpu 4 flag, that also failed. Attached are cwltool files. cwltool.log cwltool.log

azat-badretdin commented 2 years ago

The second of the files:


choosing min/max for populated genome for species: 1428, ngenomes = 698
min = 4549000
max = 7572000
genome_size = 349600
verify-genome-size: fail

you can specify pgap.py --ignore-all-errors to ignore this and gfo on with annotation

First cwltool.log same thing:


min = 4549000
max = 7572000
genome_size = 349600
verify-genome-size: fail
verify-only-ns: pass
verify-seqids: pass
abdziz commented 2 years ago

Thank you very much, it worked.

azat-badretdin commented 2 years ago

You are welcome!

jjsanchez22 commented 2 years ago

Hello there,

I am having the same issue. I could run the pipeline on the Mycoplasma genitalium genome provided with the installation. Despite of using a Linux OS, 8gb RAM and only 2 cores. However, I can't run the pipeline with my own genome. Any suggestions would be appreciated.

cwltool.log

azat-badretdin commented 2 years ago

@jjsanchez22 from the look of it:

line 1: '[' expected ( at JsonValue.authors.[])

there is a problem in the input YAML file.