Closed semiramisCJ closed 7 years ago
The pyA GFF3 file is a bit unusual. it has the ##sequence-region
stuff littered throughout rather than at the top.
I loaded your pyA file into http://genometools.org/cgi-bin/gff3validator.cgi and got this error
Validation unsuccessful!
GenomeTools error: attribute "pseudo=" on line 5 in file "/var/www/servers/genometools.org/htdocs/cgi-bin/gff3/py_A_sp_B1.gff3.txt" has no value
The convertor you used has a problem... The /pseudo
tag in Genbank is a value-less key. However, GFF3 does not support value-less keys and is putting pseudo=
in the file. This is wrong.
You could try sed -e 's/;pseudo=//g' < old.gff > new.gff
and see if that works.
Thank you very much for your soon reply!!
I fixed the py_* files in order to solve all the issues I found via the GFF3 validator and I put the ##sequence region lines at the top. However, Roary dies with the same message even though the GFF3 online validator says that the validation was successful for each of the files
A_sp_B1.gff3.txt A_denitrificans_K601.gff3.txt A_denitrificans_BC.gff3.txt
Could you please help us to find what else is wrong with the converted files? Thank you very much in advance and best regards.
2017/09/04 20:14:31 Fixing input GFF files 2017/09/04 20:14:42 Extracting proteins from GFF files
MSG: The sequence does not appear to be FASTA format (lacks a descriptor line '>') STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:472 STACK: Bio::SeqIO::fasta::next_seq /usr/local/share/perl5/Bio/SeqIO/fasta.pm:126 STACK: Bio::Roary::FilterUnknownsFromFasta::_filter_fasta_sequences_and_return_new_file /usr/local/share/perl5/Bio/Roary/FilterUnknownsFromFasta.pm:58 STACK: Bio::Roary::FilterUnknownsFromFasta::filtered_fasta_files /usr/local/share/perl5/Bio/Roary/FilterUnknownsFromFasta.pm:28 STACK: Bio::Roary::PrepareInputFiles::_input_fasta_files_filtered /usr/local/share/perl5/Bio/Roary/PrepareInputFiles.pm:58 STACK: Bio::Roary::PrepareInputFiles::fasta_files /usr/local/share/perl5/Bio/Roary/PrepareInputFiles.pm:82 STACK: Bio::Roary::CommandLine::Roary::run /usr/local/share/perl5/Bio/Roary/CommandLine/Roary.pm:277 STACK: /usr/local/bin/roary:14
If the filename ends in '.gff' it is assumed to be a GFF file, otherwise it is assumed to be a FASTA file of genes. So the solution is to rename your file extensions from '.gff3.txt' to '.gff'. I've run your data and it works fine after renaming.
On 5 September 2017 at 02:22, Semiramis C notifications@github.com wrote:
Thank you very much for your soon reply!!
I fixed the py_* files in order to solve all the issues I found via the GFF3 validator and I put the ##sequence region lines at top. However, Roary dies with the same message even though the GFF3 online validator says "Validation successful!"
A_sp_B1.gff3.txt https://github.com/sanger-pathogens/Roary/files/1275756/A_sp_B1.gff3.txt A_denitrificans_K601.gff3.txt https://github.com/sanger-pathogens/Roary/files/1275757/A_denitrificans_K601.gff3.txt A_denitrificans_BC.gff3.txt https://github.com/sanger-pathogens/Roary/files/1275758/A_denitrificans_BC.gff3.txt
Could you please help us to find what else is wrong with the converted files? Thank you very much in advance and best regards.
2017/09/04 20:14:31 Fixing input GFF files 2017/09/04 20:14:42 Extracting proteins from GFF files
MSG: The sequence does not appear to be FASTA format (lacks a descriptor line '>') STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl5/Bio/Root/Root.pm:472 STACK: Bio::SeqIO::fasta::next_seq /usr/local/share/perl5/Bio/SeqIO/ fasta.pm:126 STACK: Bio::Roary::FilterUnknownsFromFasta::_filter_fasta_sequences_and_return_new_file /usr/local/share/perl5/Bio/Roary/FilterUnknownsFromFasta.pm:58 STACK: Bio::Roary::FilterUnknownsFromFasta::filtered_fasta_files /usr/local/share/perl5/Bio/Roary/FilterUnknownsFromFasta.pm:28 STACK: Bio::Roary::PrepareInputFiles::_input_fasta_files_filtered /usr/local/share/perl5/Bio/Roary/PrepareInputFiles.pm:58 STACK: Bio::Roary::PrepareInputFiles::fasta_files /usr/local/share/perl5/Bio/Roary/PrepareInputFiles.pm:82 STACK: Bio::Roary::CommandLine::Roary::run /usr/local/share/perl5/Bio/ Roary/CommandLine/Roary.pm:277 STACK: /usr/local/bin/roary:14
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sanger-pathogens/Roary/issues/346#issuecomment-327047567, or mute the thread https://github.com/notifications/unsubscribe-auth/AABeVxuxed-ka91MbYKl4YmRJWDWb8xHks5sfKI6gaJpZM4PKw9f .
Additionally I have updated Roary to capture this case and fix it on the fly.
@semiramisCJ do use the auto-detect you will need to upgrade via CPAN.
We don't have problems running Roary with GFF3 files from Prokka, but Roary dies when we try to use different GFF3 files (described at the end), even though all the GFF3 files have the nucleotide sequence at the end of the file, they have the optional '##FASTA' line and they have the fasta headers.
Roary gives the following message:
I converted the GBK files to GFF3 with: a) seqret module + python to send the all fasta records at the end of the file [seqret* files] b) GFF in BCBio and SeqIO in python 2.7 + SeqIO (again) to add the nucleotide sequence at the end [py* files]
py_A_sp_B1.gff3.txt py_A_denitrificans_K601.gff3.txt py_A_denitrificans_BC.gff3.txt seqret_A_sp_B1.gbk.gff3.txt seqret_A_denitrificans_K601.gbk.gff3.txt seqret_A_denitrificans_BC.gbk.gff3.txt
Could somebody help us to find out how to solve this issue? Thanks in advance & best regards.
P.S.-
roary -a
gives the following details