ncbi / pgap

NCBI Prokaryotic Genome Annotation Pipeline
Other
307 stars 90 forks source link

[BUG] Failing to run my own sequence #290

Closed kfsb closed 6 months ago

kfsb commented 7 months ago

I was successfully able to run the testing sequence provided in the installation instructions, but currently unable to run my own sequence. This seems to be the main error:

FATAL: input FASTA sequence have the following problems, see ERROR or higher level messages below:
<?xml version="1.0"?>
<fastaval>
  <message tool="fastareader" severity="ERROR" code="eNoDefline" line_num="0">CFastaReader: Input doesn't start with a defline or comment around line 0. One or more sequences either lack a Sequence ID or the ID contains invalid characters.Allowed characters in Sequence ID include letters, digits, hyphens (-), underscores (_), periods (.), colons (:), asterisks (*), and number signs (#).Each sequence must be uniquely identified by a valid Sequence ID.</message>
  <message tool="fastareader" severity="ERROR" code="LINE_PROBLEM_no_sequences">No sequences could be read in input</message>
</fastaval>
Please make sure the input FASTA files meets the criteria in https://github.com/ncbi/pgap/wiki/Input-Files#sequences , or use the flag -ignore-all-errors

I understand what it's saying, but my Sequence ID is just >Test_Genome_1 I'm not too familiar with this program or programming in general, so I'm not sure what else it could be.

azat-badretdin commented 7 months ago

Thank you for your post, user @kfsb !

Could you please post the first line of your input FASTA file?

kfsb commented 7 months ago

Thank you for your post, user @kfsb !

Could you please post the first line of your input FASTA file?

>Test_Genome_1
CTTTTTTCGCCGAACTGCTTGCAACCCGATTGAGACCCGTGCTACAGTCACAGGCTCCGCTGATCGACGGTGGTGCTTCG

I also just tried using a FASTA file I got from the NCBI website, and that also didn't work.

azat-badretdin commented 7 months ago

Thanks! Could you please post your command line?

kfsb commented 7 months ago

Thanks! Could you please post your command line?

Yeah, here's the SeqID from the FASTA file from NCBI I was testing as well as the first line:

>JAHYIS010000034.1 MAG: ANME-2 cluster archaeon isolate KA02 NODE_1096_length_23983_cov_23.380282, whole genome shotgun sequence
GAAATATAGGGGTTGGAGAGCTTATGTCATTATCGTAAAGGAATAATCAGTTTGTTTTGTCACAAAGGCGGGAAATTGGT

And then the command line:

./pgap.py -r -o TEST_GENOME_NCBI -g $HOME/fasta_genomes/GCA_019429385.1_ASM1942938v1_genomic.fna -s '
ANME-2 cluster archaeon'

Link to the genome on NCBI: https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_019429385.1/

azat-badretdin commented 7 months ago

Thanks. Just crossing the t's: you have UNIX endings in input files, correct?

azat-badretdin commented 7 months ago

I opened an internal investigation ticket for this case.

azat-badretdin commented 7 months ago

I also just tried using a FASTA file I got from the NCBI website, and that also didn't work.

We tried to reproduce that one (courtesy of our Tech Lead @george-coulouris ) and we were not able to do that.

Could you please post your cwltool.log file? Thanks

kfsb commented 7 months ago

I also just tried using a FASTA file I got from the NCBI website, and that also didn't work.

We tried to reproduce that one (courtesy of our Tech Lead @george-coulouris ) and we were not able to do that.

Could you please post your cwltool.log file? Thanks

Here is that file for you

cwltool.log

azat-badretdin commented 7 months ago

Thanks.

Note that for the one that it is in Quick Start, you were successful:

I was successfully able to run the testing sequence provided in the installation instructions

The error is very generic about very basic problems with FASTA input. Could you please post the command line for the run that succeeded?

kfsb commented 7 months ago

Thanks.

Note that for the one that it is in Quick Start, you were successful:

I was successfully able to run the testing sequence provided in the installation instructions

The error is very generic about very basic problems with FASTA input. Could you please post the command line for the run that succeeded?

The command I used was just the basic command provided on the wiki:

./pgap.py -r -o mg37_results -g $HOME/.pgap/test_genomes/MG37/ASM2732v1.annotation.nucleotide.1.fasta -s 'Mycoplasmoides genitalium'

Should I put my FASTA files in the .pgap directory maybe?

EDIT: Tried putting the FASTA files in the .pgap directory, got the same error as before. EDIT 2: Tried manually creating an input.yaml and submol.yaml file, did not make a difference.

azat-badretdin commented 7 months ago

Should I put my FASTA files in the .pgap directory maybe?

I would definitely try to put the input FASTA file in your local directory and see what happens

azat-badretdin commented 7 months ago

I also noticed this snippet in the cwltool.log file you posted:

        "datum": {
            "class": "File",
            "location": "file:///pgap/user_input/GCA_019429385.1_ASM1942938v1_genomic.fna",
            "size": 9,
            "basename": "GCA_019429385.1_ASM1942938v1_genomic.fna",
            "nameroot": "GCA_019429385.1_ASM1942938v1_genomic",
            "nameext": ".fna",
            "path": "/tmp/5q9vdugd/stg33024830-a022-4359-b016-726aec59ae77/GCA_019429385.1_ASM1942938v1_genomic.fna",
            "dirname": "/tmp/5q9vdugd/stg33024830-a022-4359-b016-726aec59ae77"

Notice the size of the file that looks incorrect. Other files mentioned on other mount points of docker container looks fine.

There is something specifically peculiar about mounting the directory where you keep your files.

azat-badretdin commented 7 months ago

EDIT: Tried putting the FASTA files in the .pgap directory, got the same error as before

This is especially enigmatic. In essence you put your FASTA file "next" to Mycoplasma input, yet you got different results. You are not using symlinks correct?

kfsb commented 7 months ago

Should I put my FASTA files in the .pgap directory maybe?

I would definitely try to put the input FASTA file in your local directory and see what happens

Got the same result

EDIT: Tried putting the FASTA files in the .pgap directory, got the same error as before

This is especially enigmatic. In essence you put your FASTA file "next" to Mycoplasma input, yet you got different results. You are not using symlinks correct?

Nope, no symlinks

azat-badretdin commented 7 months ago

Thanks!

Could you please try:

/usr/bin/docker run -i --rm --user 1000:1000 \
--volume /home/ubuntu/.pgap/input-2023-10-03.build7061:/pgap/input:ro,z \
--volume /home/ubuntu:/pgap/user_input:z \
--volume /home/ubuntu/pgap_input_r2vjugx7.yaml:/pgap/user_input/pgap_input.yaml:ro,z \
--volume /tmp:/tmp:rw,z \
--volume /home/ubuntu/TEST_GENOME_NCBI:/pgap/output:rw,z \
ncbi/pgap:2023-10-03.build7061 \
ls -l /pgap/user_input/GCA_019429385.1_ASM1942938v1_genomic.fna

and if the file exists then

/usr/bin/docker run -i --rm --user 1000:1000 \
--volume /home/ubuntu/.pgap/input-2023-10-03.build7061:/pgap/input:ro,z \
--volume /home/ubuntu:/pgap/user_input:z \
--volume /home/ubuntu/pgap_input_r2vjugx7.yaml:/pgap/user_input/pgap_input.yaml:ro,z \
--volume /tmp:/tmp:rw,z \
--volume /home/ubuntu/TEST_GENOME_NCBI:/pgap/output:rw,z \
ncbi/pgap:2023-10-03.build7061 \
cat /pgap/user_input/GCA_019429385.1_ASM1942938v1_genomic.fna > retrieve_back.GCA_019429385.1_ASM1942938v1_genomic.fna 
kfsb commented 7 months ago

Thanks!

Could you please try:

/usr/bin/docker run -i --rm --user 1000:1000 \
--volume /home/ubuntu/.pgap/input-2023-10-03.build7061:/pgap/input:ro,z \
--volume /home/ubuntu:/pgap/user_input:z \
--volume /home/ubuntu/pgap_input_r2vjugx7.yaml:/pgap/user_input/pgap_input.yaml:ro,z \
--volume /tmp:/tmp:rw,z \
--volume /home/ubuntu/TEST_GENOME_NCBI:/pgap/output:rw,z \
ncbi/pgap:2023-10-03.build7061 \
ls -l /pgap/user_input/GCA_019429385.1_ASM1942938v1_genomic.fna

and if the file exists then

/usr/bin/docker run -i --rm --user 1000:1000 \
--volume /home/ubuntu/.pgap/input-2023-10-03.build7061:/pgap/input:ro,z \
--volume /home/ubuntu:/pgap/user_input:z \
--volume /home/ubuntu/pgap_input_r2vjugx7.yaml:/pgap/user_input/pgap_input.yaml:ro,z \
--volume /tmp:/tmp:rw,z \
--volume /home/ubuntu/TEST_GENOME_NCBI:/pgap/output:rw,z \
ncbi/pgap:2023-10-03.build7061 \
cat /pgap/user_input/GCA_019429385.1_ASM1942938v1_genomic.fna > retrieve_back.GCA_019429385.1_ASM1942938v1_genomic.fna 

This is what I get when I tried that first one:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/home/ubuntu/pgap_input_r2vjugx7.yaml" to rootfs at "/pgap/user_input/pgap_input.yaml": mount /home/ubuntu/pgap_input_r2vjugx7.yaml:/pgap/user_input/pgap_input.yaml (via /proc/self/fd/6), flags: 0x5000: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type.

azat-badretdin commented 7 months ago

I should have foreseen that.

Removed some unnecessary mounts and could you please try this?

/usr/bin/docker run -i --rm --user 1000:1000 \
--volume /home/ubuntu:/pgap/user_input:z \
ncbi/pgap:2023-10-03.build7061 \
ls -l /pgap/user_input/GCA_019429385.1_ASM1942938v1_genomic.fna

and if succeeded, then:

/usr/bin/docker run -i --rm --user 1000:1000 \
--volume /home/ubuntu:/pgap/user_input:z \
ncbi/pgap:2023-10-03.build7061 \
cat  /pgap/user_input/GCA_019429385.1_ASM1942938v1_genomic.fna
kfsb commented 7 months ago

I should have foreseen that.

Removed some unnecessary mounts and could you please try this?

/usr/bin/docker run -i --rm --user 1000:1000 \
--volume /home/ubuntu:/pgap/user_input:z \
ncbi/pgap:2023-10-03.build7061 \
ls -l /pgap/user_input/GCA_019429385.1_ASM1942938v1_genomic.fna

and if succeeded, then:

/usr/bin/docker run -i --rm --user 1000:1000 \
--volume /home/ubuntu:/pgap/user_input:z \
ncbi/pgap:2023-10-03.build7061 \
cat  /pgap/user_input/GCA_019429385.1_ASM1942938v1_genomic.fna

The first one worked but the second one said Not Found

azat-badretdin commented 7 months ago

The fact that you can ls the file but not cat it indicates to some idiosyncratic local Docker setup issue, IMHO. I would recommend to check with your favorite Docker experts on what is going on with this setup.