sanger-pathogens / Roary

Rapid large-scale prokaryote pan genome analysis
http://sanger-pathogens.github.io/Roary
Other
314 stars 189 forks source link

query_pan_genome always fails #451

Open BioSina opened 5 years ago

BioSina commented 5 years ago

I am trying to run query_pan_genome roughly like this: query_pan_genome -g clustered_proteins -o output.txt -a difference -i 1.gff,2.gff -t 3.gff

I also tried replacing the -i and -t with --input_set_one and --input_set_two, removing the -g and -o options, removing 2.gff ,...

It always just returns the usage:

Usage: query_pan_genome [options] *.gff
Perform set operations on the pan genome to see the gene differences between groups of isolates.

Options: -g STR    groups filename [clustered_proteins]
         -a STR    action (union/intersection/complement/gene_multifasta/difference) [union]
         -c FLOAT  percentage of isolates a gene must be in to be core [99]
         -o STR    output filename [pan_genome_results]
         -n STR    comma separated list of gene names for use with gene_multifasta action
         -i STR    comma separated list of filenames, comparison set one
         -t STR    comma separated list of filenames, comparison set two
         -v        verbose output to STDOUT
         -h        this help message

Examples: 
Union of genes found in isolates
         query_pan_genome -a union *.gff

Intersection of genes found in isolates (core genes)
         query_pan_genome -a intersection *.gff

Complement of genes found in isolates (accessory genes)
         query_pan_genome -a complement *.gff

Extract the sequence of each gene listed and create multi-FASTA files
         query_pan_genome -a gene_multifasta -n gryA,mecA,abc *.gff

Gene differences between sets of isolates
         query_pan_genome -a difference --input_set_one 1.gff,2.gff --input_set_two 3.gff,4.gff,5.gff

For further info see: http://sanger-pathogens.github.io/Roary/

I have no idea what is wrong to be honest.

puethe commented 5 years ago

Hi @BioSina , as specified in the usage info, you'll need to supply "-a difference" as parameter if you want to check the differences between two sets of isolates. Hope this helps, Christoph

BioSina commented 5 years ago

But I did that?

KarenGoncalves commented 1 month ago

Wondering if there is any update in this. In my case, I if I suppy the gffs in --input_set_one and --input_set_two, I get an error saying the files are not fasta.

query_pan_genome\
 -a difference\
 -g clustered_proteins\
 --input_set_one 1.gff3,2.gff3,3.gff3,4.gff3,5.gff3\
 --input_set_two 6.gff3,7.gff3,8.gff3,9.gff3\
 -o pan_genome_St42
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: The sequence does not appear to be FASTA format (lacks a descriptor line '>')
STACK: Error::throw
STACK: Bio::Root::Root::throw /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/bioperl/1.7.7/lib/perl5/site_perl/5.30.2/Bio/Root/Root.pm:449
STACK: Bio::SeqIO::fasta::next_seq /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/bioperl/1.7.7/lib/perl5/site_perl/5.30.2/Bio/SeqIO/fasta.pm:137
STACK: Bio::Roary::FilterUnknownsFromFasta::_filter_fasta_sequences_and_return_new_file /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/roary/3.13.0/lib/Bio/Roary/FilterUnknownsFromFasta.pm:69
STACK: Bio::Roary::FilterUnknownsFromFasta::filtered_fasta_files /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/roary/3.13.0/lib/Bio/Roary/FilterUnknownsFromFasta.pm:39
STACK: Bio::Roary::PrepareInputFiles::_input_fasta_files_filtered /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/roary/3.13.0/lib/Bio/Roary/PrepareInputFiles.pm:102
STACK: Bio::Roary::PrepareInputFiles::fasta_files /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/roary/3.13.0/lib/Bio/Roary/PrepareInputFiles.pm:126
STACK: Bio::Roary::CommandLine::QueryRoary::run /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/roary/3.13.0/lib/Bio/Roary/CommandLine/QueryRoary.pm:118
STACK: /cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Compiler/gcc9/roary/3.13.0/bin/query_pan_genome:19
-----------------------------------------------------------

I tried supplying fasta files, like below

query_pan_genome\
 -a difference\
 -g clustered_proteins\
 --input_set_one 1.fasta,2.fasta,3.fasta,4.fasta,5.fasta\
 --input_set_two 6.fasta,7.fasta,8.fasta,9.fasta\
 -o pan_genome_St42

No errors, but the results generated are all empty (just headers and skeletons -- in the case of the .dot files -- are present).