Closed dinesh1st closed 5 years ago
@dinesh1st can you paste the first 3 lines of the gene_presence_absence.csv
here?
eg. run head -n 3 gene_presence_absence.csv
Also, paste these 3 lines from one of your .gff input files
eg. run grep CDS YOURFILE.gff | head -n 3
My csv output looks like
Gene | Non-unique Gene name | Annotation | No. isolates | No. sequences | Avg sequences per isolate | Genome Fragment | Order within Fragment | Accessory Fragment | Accessory Order with Fragment | QC | Min group size nuc | Max group size nuc | Avg group size nuc | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
group_1 | 4 | 4 | 1 | PAO1_01930(+) | PA14_03260(-) | PA34_03444(-) | VRFPA01_02925(+) | |||||||||||
group_1000 | 4 | 4 | 1 | PAO1_03900(+) | PA14_01208(-) | PA34_01191(-) | VRFPA01_04900(+) | |||||||||||
group_1001 | 4 | 4 | 1 | PAO1_03912(-) | PA14_01196(+) | PA34_01179(+) | VRFPA01_04912(-) |
One of my gff file looks like
gnl|Prokka|PAO1_1 Prodigal:2.6 CDS 483 2027 . + 0 ID=PAO1_00001;Parent=PAO1_00001_gene;Name=dnaA;gene=dnaA;inference=ab initio prediction:Prodigal:2.6,similar to AA sequence:UniProtKB:P03004;locus_tag=PAO1_00001;product=Chromosomal replication initiator protein DnaA;protein_id=gnl|Prokka|PAO1_00001 gnl|Prokka|PAO1_1 prokka gene 483 2027 . + . ID=PAO1_00001_gene;Name=dnaA;gene=dnaA;locus_tag=PAO1_00001
Could you paste the output of roary -a
?
2018/10/23 22:13:17 Optional tool 'Rscript' not found in your $PATH 2018/10/23 22:13:17 Looking for 'awk' - found /usr/bin/awk 2018/10/23 22:13:17 Looking for 'bedtools' - found /Users/dinesh/anaconda2/bin/bedtools 2018/10/23 22:13:17 Determined bedtools version is 2.27 2018/10/23 22:13:17 Looking for 'blastp' - found /Users/dinesh/anaconda2/bin/blastp 2018/10/23 22:13:20 Determined blastp version is 2.7.1 2018/10/23 22:13:20 Looking for 'grep' - found /usr/bin/grep 2018/10/23 22:13:20 Optional tool 'kraken' not found in your $PATH 2018/10/23 22:13:20 Optional tool 'kraken-report' not found in your $PATH 2018/10/23 22:13:20 Looking for 'mafft' - found /Users/dinesh/anaconda2/bin/mafft 2018/10/23 22:13:20 Determined mafft version is 7.407 2018/10/23 22:13:20 Looking for 'makeblastdb' - found /Users/dinesh/anaconda2/bin/makeblastdb 2018/10/23 22:13:20 Determined makeblastdb version is 2.7.1 2018/10/23 22:13:20 Looking for 'mcl' - found /Users/dinesh/anaconda2/bin/mcl 2018/10/23 22:13:20 Determined mcl version is 14-137 2018/10/23 22:13:20 Looking for 'parallel' - found /Users/dinesh/anaconda2/bin/parallel 2018/10/23 22:13:21 Determined parallel version is 20160622 2018/10/23 22:13:21 Looking for 'prank' - found /Users/dinesh/anaconda2/bin/prank 2018/10/23 22:13:21 Looking for 'sed' - found /usr/bin/sed 2018/10/23 22:13:21 Looking for 'cd-hit' - found /Users/dinesh/anaconda2/bin/cd-hit 2018/10/23 22:13:21 Determined cd-hit version is 4.7 2018/10/23 22:13:21 Looking for 'FastTree' - found /Users/dinesh/anaconda2/bin/FastTree 2018/10/23 22:13:21 Determined FastTree version is 2.1 2018/10/23 22:13:21 Roary version 3.7.0 2018/10/23 22:13:21 Error: You need to provide at least 2 files to build a pan genome Usage: roary [options] *.gff
It looks like you are running an older version of Roary (released 2 years ago). Please upgrade to the latest version and try again.
Can you please advise me why I am not getting gene annotation in gene_presence_absence.csv output file? My previous experience was I got the name of all the genes and annotations in this output. However, this time Roary gave me only group numbers. The difference between these two attempts was earlier one I have used draft genomes and this time I am using complete genomes. Does it make difference?