Closed sekhwal closed 2 months ago
1/ Salmonella is not "species", it's "genus" 2/ We lost the functionality of supporting "genus" option in this release and we are working on restoring it soon 3/ Case might be important (usually biologists always capitalize genus in binomials, so I am not familiar with this use case).
Please try
genus_species: 'Salmonella enterica'
or other legitimate Salmonella species.
Thank you for the information. It works, but when I run it with location: 'plasmid' it generates the same error "Final process status is permanentFail".
Please let me know what change I should make in the submol.yaml file. Here is the information of my current submol.yaml file that I am trying to run for plasmid genome.
topology: 'circular' location: 'plasmids' organism: genus_species: 'Salmonella enterica' strain: 'P1122481'
location: 'plasmids'
Should be strictly 'plasmid' or 'chromosome'
You can also try using our relatively new way of running pgap.py specified in quick notes, where all the information is in FASTA file and species qualification:
./pgap.py .... -s 'My species' -g My.fasta
In this case you can specify plasmid molecules by appending [location=plasmid] to your FASTA definition lines for corresponding sequences
I tried the following way python3 /scripts/pgap.py -r -o P1122481_results -s 'Salmonella enterica' -g P1122481.fasta
I am using the fasta file with the header
1_length=4998493_depth=1.00xcircular=true[location=chromosome]
But still generating the issue ""Final process status is permanentFail".
In another way, I used correctly location: 'plasmid' in in the submol.yaml but it still unable to run.
topology: 'circular' location: 'plasmid' organism: genus_species: 'Salmonella enterica' strain: 'P1122481'
1_length=4998493_depth=1.00xcircular=true[location=chromosome]
Please review https://github.com/ncbi/pgap/wiki/Input-Files#Genome-assembly-sequence-file. There are several characters that are not allowed in this SeqID (the SeqID is everything before the first space). You can try SeqID of 1 and add modifiers:
1 [topology=circular] [location=chromosome]
Length and depth are not supported modifiers according to: https://www.ncbi.nlm.nih.gov/genbank/mods_fastadefline/
But still generating the issue ""Final process status is permanentFail".
Could you please post the resulting cwltool.log
file? Thanks!
It seems the header line is correct. And it is still showing an error "WARNING Final process status is permanentFail " with plasmid sequence. However, it works with 'chromosome' even I did not change any in the header ">1 length=4998493 depth=1.00x circular=true".
python3 /scripts/pgap.py -r -o P1122481_plasmid input_P1122481_plasmid.yaml
contig001 [location = plasmid] [plasmid-name = pPSU1122481] [topology=circular]
fasta: class: File location: P1122481_plasmid.fasta submol: class: File location: P1122481_plasmid1_submol.yaml
cwltool.log topology: 'circular' location: 'plasmid' organism: genus_species: 'Salmonella enterica' strain: 'pPSU1122481'
It seems, it does not work with small genomes like plasmid. I used pgap earlier and it worked perfectly without concerning about any specify header and special letters. Should I download old version and try?
Try ./pgap.py --ignore-errors ....
It works when I use both chromosome and plasmid in one fasta file. I think the latest pgap version has issue of having small genome like plasmid.
python3 /scripts/pgap.py -r -o P2226300_results input_P2226300.yaml Thank you for your help!
It works when I use both chromosome and plasmid in one fasta file.
Because with chromosome, the total size of the genome matches the expectation for this particular species.
It does not reject plasmids per se (you can try to replace kewword plasmid with chromosome in that small plasmid FASTA file) and see for yourself - the result will be the same, because it rejects by size, not by molecule type
Have you tried inserting --ignore-errors
into the list of command line switches?
@azat-badretdin I have a similar issue. Please find attached my cwtool.log file cwltool.log
User @vappiah I am not so sure. It says
'contig001[location=chromosome]' is not a valid local ID (m_Pos = 1)
which most likely means that you omitted quite crucial space delimiter separating seq-id from the rest of FASTA definition line
It's a different error from the same ballpark "things that users do in FASTA definition line"
Thanks @azat-badretdin . I made the necessary correction and it works now.
Glad to hear that, user @vappiah !
Hi, I am following my previous issue #304, it has been closed.
I am already using 'salmonella' in submol.yaml, but I am not able to get the results. When I change the genus_species as 'Escherichia coli' pgap keep running for so long with generating the results.
topology: 'circular' location: 'chromosome' organism: genus_species: 'salmonella' strain: 'P1620800_chr'