sanger-pathogens / Roary

Rapid large-scale prokaryote pan genome analysis
http://sanger-pathogens.github.io/Roary
Other
314 stars 189 forks source link

No gene annotation in gene_presence_absence.csv output #428

Closed dinesh1st closed 5 years ago

dinesh1st commented 5 years ago

Can you please advise me why I am not getting gene annotation in gene_presence_absence.csv output file? My previous experience was I got the name of all the genes and annotations in this output. However, this time Roary gave me only group numbers. The difference between these two attempts was earlier one I have used draft genomes and this time I am using complete genomes. Does it make difference?

tseemann commented 5 years ago

@dinesh1st can you paste the first 3 lines of the gene_presence_absence.csv here? eg. run head -n 3 gene_presence_absence.csv

Also, paste these 3 lines from one of your .gff input files eg. run grep CDS YOURFILE.gff | head -n 3

dinesh1st commented 5 years ago

My csv output looks like

Gene Non-unique Gene name Annotation No. isolates No. sequences Avg sequences per isolate Genome Fragment Order within Fragment Accessory Fragment Accessory Order with Fragment QC Min group size nuc Max group size nuc Avg group size nuc 1 2 3 4  
group_1     4 4 1                 PAO1_01930(+) PA14_03260(-) PA34_03444(-) VRFPA01_02925(+)
group_1000   4 4 1                 PAO1_03900(+) PA14_01208(-) PA34_01191(-) VRFPA01_04900(+)
group_1001   4 4 1                 PAO1_03912(-) PA14_01196(+) PA34_01179(+) VRFPA01_04912(-)

One of my gff file looks like

gff-version 3

sequence-region gnl|Prokka|PAO1_1 1 6264404

gnl|Prokka|PAO1_1 Prodigal:2.6 CDS 483 2027 . + 0 ID=PAO1_00001;Parent=PAO1_00001_gene;Name=dnaA;gene=dnaA;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:P03004;locus_tag=PAO1_00001;product=Chromosomal replication initiator protein DnaA;protein_id=gnl|Prokka|PAO1_00001 gnl|Prokka|PAO1_1 prokka gene 483 2027 . + . ID=PAO1_00001_gene;Name=dnaA;gene=dnaA;locus_tag=PAO1_00001

andrewjpage commented 5 years ago

Could you paste the output of roary -a ?

dinesh1st commented 5 years ago

2018/10/23 22:13:17 Optional tool 'Rscript' not found in your $PATH 2018/10/23 22:13:17 Looking for 'awk' - found /usr/bin/awk 2018/10/23 22:13:17 Looking for 'bedtools' - found /Users/dinesh/anaconda2/bin/bedtools 2018/10/23 22:13:17 Determined bedtools version is 2.27 2018/10/23 22:13:17 Looking for 'blastp' - found /Users/dinesh/anaconda2/bin/blastp 2018/10/23 22:13:20 Determined blastp version is 2.7.1 2018/10/23 22:13:20 Looking for 'grep' - found /usr/bin/grep 2018/10/23 22:13:20 Optional tool 'kraken' not found in your $PATH 2018/10/23 22:13:20 Optional tool 'kraken-report' not found in your $PATH 2018/10/23 22:13:20 Looking for 'mafft' - found /Users/dinesh/anaconda2/bin/mafft 2018/10/23 22:13:20 Determined mafft version is 7.407 2018/10/23 22:13:20 Looking for 'makeblastdb' - found /Users/dinesh/anaconda2/bin/makeblastdb 2018/10/23 22:13:20 Determined makeblastdb version is 2.7.1 2018/10/23 22:13:20 Looking for 'mcl' - found /Users/dinesh/anaconda2/bin/mcl 2018/10/23 22:13:20 Determined mcl version is 14-137 2018/10/23 22:13:20 Looking for 'parallel' - found /Users/dinesh/anaconda2/bin/parallel 2018/10/23 22:13:21 Determined parallel version is 20160622 2018/10/23 22:13:21 Looking for 'prank' - found /Users/dinesh/anaconda2/bin/prank 2018/10/23 22:13:21 Looking for 'sed' - found /usr/bin/sed 2018/10/23 22:13:21 Looking for 'cd-hit' - found /Users/dinesh/anaconda2/bin/cd-hit 2018/10/23 22:13:21 Determined cd-hit version is 4.7 2018/10/23 22:13:21 Looking for 'FastTree' - found /Users/dinesh/anaconda2/bin/FastTree 2018/10/23 22:13:21 Determined FastTree version is 2.1 2018/10/23 22:13:21 Roary version 3.7.0 2018/10/23 22:13:21 Error: You need to provide at least 2 files to build a pan genome Usage: roary [options] *.gff

andrewjpage commented 5 years ago

It looks like you are running an older version of Roary (released 2 years ago). Please upgrade to the latest version and try again.