Closed adriangeerre closed 1 year ago
The format of your input YAML file is incorrect, see this
line 1: "taxon": unexpected member, should be one of: "strain" "genus_species" ( at JsonValue.organism)
Sorry, I forgot to add my execution line: python ~/programas/PGAP/pgap.py --debug -r -o mg37_results test_genomes/MG37/input.yaml
I used the MG37 input.yaml from the test genomes, I download them yesterday. input.zip
I have tried also with the file "test_genomes/GCA_000009765/input.yaml" and I obtained the same result.
UnexpectedMember() --- line 1: "taxon": unexpected member, should be one of: "strain" "genus_species"
What can I do? Thanks for the help!
I have tried also with the file "test_genomes/GCA_000009765/input.yaml" and I obtained the same result.
See https://github.com/ncbi/pgap/wiki/Input-Files#metadata-yaml-file-submol
I see and I think I got your point. I thought the input and submol where ready to use. Instead of adapting those files for MG37, I have swap to the genome and files from a previous successful run (I will call it Bact). I am currently testing this. Thank you again. I hope it works!
They are ready to use. I am not sure where did you get the file with "taxon:"
It could be old files from previous installations.
I got the link from the installation instructions in the wiki and I run:
wget https://s3.amazonaws.com/pgap-data/test_genomes.tgz
I just tested this tarball, it does not have any files with the word "taxon" either.
That's weird, I can see the word taxon in the submol of all the test genomes that I just downloaded (again). Here are the steps I just did:
$ wget https://s3.amazonaws.com/pgap-data/test_genomes.tgz
--2023-05-17 22:42:02-- https://s3.amazonaws.com/pgap-data/test_genomes.tgz
Resolving proxy-default (proxy-default)... 10.220.0.1
Connecting to proxy-default (proxy-default)|10.220.0.1|:3128... connected.
Proxy request sent, awaiting response... 200 OK
Length: 19691644 (19M) [binary/octet-stream]
Saving to: ‘test_genomes.tgz’
100%[===========================================>] 19,691,644 13.9MB/s in 1.4s
2023-05-17 22:42:04 (13.9 MB/s) - ‘test_genomes.tgz’ saved [19691644/19691644]
$ ls -l
total 19231
-rw-rw-r-- 1 agomez CCRP_Data 19691644 Mar 8 2019 test_genomes.tgz
$ tar -xzf test_genomes.tgz
$ grep -i taxon test_genomes/*/submol.yaml
test_genomes/GCA_000009765/submol.yaml: taxon: 227882
test_genomes/GCA_000166555/submol.yaml: taxon: 913090
test_genomes/GCA_000167475/submol.yaml: taxon: 307502
test_genomes/GCA_000181555/submol.yaml: taxon: 445983
test_genomes/GCA_000186345/submol.yaml: taxon: 575540
test_genomes/GCA_000710235/submol.yaml: taxon: 623
test_genomes/MG37/submol.yaml: taxon: 243273
test_genomes/SAMN07633424/submol.yaml: taxon: 197
test_genomes/SAMN09729021/submol.yaml: taxon: 283734
test_genomes/SAMN09768125/submol.yaml: taxon: 630
test_genomes/SAMN09783348/submol.yaml: taxon: 197
test_genomes/SAMN09828454/submol.yaml: taxon: 562
test_genomes/SAMN09831750/submol.yaml: taxon: 1354
test_genomes/SAMN09831988/submol.yaml: taxon: 623
test_genomes/SAMN09837224/submol.yaml: taxon: 1639
test_genomes/SAMN09838637/submol.yaml: taxon: 670
test_genomes/SAMN09839044/submol.yaml: taxon: 28901
Nonetheless, using the genome that I previously annotated, I was able to make it run inside an HPC using an srun session (It did not finished because I needed to cut the live session).
However, when sending a job to the SLURM queue in the same HPC environment the job crashes within seconds and report the message taskset: failed to set pid 0's affinity: Invalid argument
(which is not my reported issue but an step before). I found that the issue #202 already discussed about it and I might agree that Singularity have an odd behavior which could be causing weird and multiple errors.
Thanks for the help, again, and sorry for the chaotic feedback.
That's weird, I can see the word taxon in the submol of all the test genomes that I just downloaded (again). Here are the steps I just did:
You are right and I was wrong (I made a typo) . Indeed, that tarball contains submol examples with taxon:
- outdated format.
That tarball is obsolete and we need to fix our documentation. Meanwhile, the test genomes are part of the installation that goes to dedicated PGAP installation directory, see https://github.com/ncbi/pgap/wiki/Quick-Start#quick-start
Install the pipeline. By default it will install in $HOME/.pgap, but this location can be changed by setting an environmental variable PGAP_INPUT_DIR
That's where you will find up-to-date test genomes.
Thanks for patiently pushing this issue, @adriangeerre !
I got the link from the installation instructions in the wiki and I run:
wget https://s3.amazonaws.com/pgap-data/test_genomes.tgz
I am having trouble finding installation reference to the tarball. Could you please post a URL?
I found the link in the installation section of the wiki https://github.com/ncbi/pgap/wiki/Installation
. Right at the bottom, in the section Running the pipeline on a test genome, there is a link (our test genome archive). That is the link I used to download the data.
Thanks for the help and the patience @azat-badretdin
I found the link in the installation section of the wiki https://github.com/ncbi/pgap/wiki/Installation
Thank you! I was looking for the URL and apparently github does not index URLs inside links. :-(
I fixed the text of documentation you pointed to by URL. Please let me know what else we can do for you here.
Describe the bug I am currently facing this same issue as in #245. I have a pipeline which used to work perfectly (implemented 6-8 months ago) but nowadays it stops the annotations with the error "WARNING Final process status is permanentFail".
To Reproduce I have tested the installation in 2 different systems, HPC and laptop, using Singularity and Docker with the MG37 test input and it would:
Expected behavior I would expect the normal behavior of the test run.
Software versions (please complete the following information):
Log Files The cwltool.log failed with "permanentFail". cwltool.log cwltool.failed_step.log
Additional context The folder ".pgap" is a link to another folder which contains the data required by PGAP.