I'm running Roary on three genomes, which I annotated with Prokka. Apparently the BLAST steps fails, although the program still finishes. The combined genes from the three genomes end up as the core genome. Do you have any idea what's going on there?
Cheers, Aaron
roary -f roary_out -e -n -v prokkaout/gff
2017/04/03 15:14:46 Output directory name exists already so adding a timestamp to the end
2017/04/03 15:14:46 Output directory created: roary_out_1491225286
2017/04/03 15:14:46 Fixing input GFF files
2017/04/03 15:14:49 Extracting proteins from GFF files
Extracting proteins from /home/aaron/prokka_out/PROKKA_04032017.gff
Extracting proteins from /home/aaron/prokka_PA14_out/PROKKA_04032017.gff
Extracting proteins from /home/aaron/prokka_PAO_out/PROKKA_04032017.gff
Combine proteins into a single file
Iteratively run cd-hit
Parallel all against all blast
BLAST Database error: No alias or index file found for protein database [/home/aaron/roary_out_1491225286/J3RyMjOrGu/output_contigs] in search path [/home/aaron/roary_out_1491225286::]
Cluster with MCL
2017/04/03 15:15:06 Running command: pan_genome_post_analysis -o clustered_proteins -p pan_genome.fa -s gene_presence_absence.csv -c _clustered.clstr --output_multifasta_files -i /home/aaron/roary_out_1491225286/vhFjmCR9Ey//_gff_files -f /home/aaron/roary_out_1491225286/vhFjmCR9Ey//_fasta_files -t 11 --dont_create_rplots -v --mafft -j Local --processors 1 --group_limit 50000 -cd 99
Use of uninitialized value in require at /usr/lib/perl/5.18/Encode.pm line 60.
2017/04/03 15:15:06 Reinflate clusters
2017/04/03 15:15:06 Split groups with paralogs
2017/04/03 15:15:07 Labelling the groups
2017/04/03 15:15:07 Transfering the annotation to the groups
2017/04/03 15:15:11 Creating accessory binary gene presence and absence fasta
2017/04/03 15:15:11 Creating accessory binary gene presence and absence tree
2017/04/03 15:15:11 The input file is too small so not creating a tree
2017/04/03 15:15:11 Creating accessory gene presence and absence clusters
2017/04/03 15:15:11 Theres no accessory binary file so skipping accessory binary clustering
2017/04/03 15:15:11 Creating the spreadsheet with gene presence and absence
2017/04/03 15:15:19 Creating summary statistics of the spreadsheet
2017/04/03 15:15:24 Creating tab files for R
2017/04/03 15:15:25 Create EMBL files
2017/04/03 15:15:26 Creating files with the nucleotide sequences for every cluster
2017/04/03 15:15:34 Cleaning up files
Aligning each cluster
Use of uninitialized value in require at (eval 2091) line 1.
2017/04/03 15:15:34 Running command: pan_genome_core_alignment -cd 99
2017/04/03 15:15:34 pan_genome_core_alignment -cd 99
--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet
--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet
--------------------- WARNING ---------------------
MSG: Got a sequence without letters. Could not guess alphabet
Output of roary -a is
2017/04/03 15:07:41 Looking for 'Rscript' - found /usr/bin/Rscript
2017/04/03 15:07:41 Determined Rscript version is 3.0
2017/04/03 15:07:41 Looking for 'awk' - found /usr/bin/awk
2017/04/03 15:07:41 Looking for 'bedtools' - found /usr/bin/bedtools
2017/04/03 15:07:41 Determined bedtools version is 2.17
2017/04/03 15:07:41 Looking for 'blastp' - found /usr/bin/blastp
2017/04/03 15:07:41 Determined blastp version is 2.2.28
2017/04/03 15:07:41 Looking for 'grep' - found /bin/grep
2017/04/03 15:07:41 Optional tool 'kraken' not found in your $PATH
2017/04/03 15:07:41 Optional tool 'kraken-report' not found in your $PATH
2017/04/03 15:07:41 Looking for 'mafft' - found /usr/bin/mafft
Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl/5.18.2/Bio/Roary/External/CheckTools.pm line 129.
2017/04/03 15:07:42 Determined mafft version is
2017/04/03 15:07:42 Looking for 'makeblastdb' - found /usr/bin/makeblastdb
2017/04/03 15:07:42 Determined makeblastdb version is 2.2.28
2017/04/03 15:07:42 Looking for 'mcl' - found /usr/bin/mcl
2017/04/03 15:07:42 Determined mcl version is 12-135
2017/04/03 15:07:42 Looking for 'parallel' - found /usr/bin/parallel
2017/04/03 15:07:42 Determined parallel version is 20130922
2017/04/03 15:07:42 Looking for 'prank' - found /usr/bin/prank
2017/04/03 15:07:42 Looking for 'sed' - found /bin/sed
2017/04/03 15:07:42 Looking for 'cdhit' - found /usr/bin/cdhit
2017/04/03 15:07:42 Determined cdhit version is 4.6
2017/04/03 15:07:42 Looking for 'fasttree' - found /usr/bin/fasttree
2017/04/03 15:07:42 Determined fasttree version is 2.1
2017/04/03 15:07:42 Roary version 3.8.0
2017/04/03 15:07:42 Error: You need to provide at least 2 files to build a pan genome
Hi Andrew,
I'm running Roary on three genomes, which I annotated with Prokka. Apparently the BLAST steps fails, although the program still finishes. The combined genes from the three genomes end up as the core genome. Do you have any idea what's going on there?
Cheers, Aaron
roary -f roary_out -e -n -v prokkaout/gff
2017/04/03 15:14:46 Output directory name exists already so adding a timestamp to the end 2017/04/03 15:14:46 Output directory created: roary_out_1491225286 2017/04/03 15:14:46 Fixing input GFF files 2017/04/03 15:14:49 Extracting proteins from GFF files Extracting proteins from /home/aaron/prokka_out/PROKKA_04032017.gff Extracting proteins from /home/aaron/prokka_PA14_out/PROKKA_04032017.gff Extracting proteins from /home/aaron/prokka_PAO_out/PROKKA_04032017.gff Combine proteins into a single file Iteratively run cd-hit Parallel all against all blast BLAST Database error: No alias or index file found for protein database [/home/aaron/roary_out_1491225286/J3RyMjOrGu/output_contigs] in search path [/home/aaron/roary_out_1491225286::] Cluster with MCL 2017/04/03 15:15:06 Running command: pan_genome_post_analysis -o clustered_proteins -p pan_genome.fa -s gene_presence_absence.csv -c _clustered.clstr --output_multifasta_files -i /home/aaron/roary_out_1491225286/vhFjmCR9Ey//_gff_files -f /home/aaron/roary_out_1491225286/vhFjmCR9Ey//_fasta_files -t 11 --dont_create_rplots -v --mafft -j Local --processors 1 --group_limit 50000 -cd 99 Use of uninitialized value in require at /usr/lib/perl/5.18/Encode.pm line 60. 2017/04/03 15:15:06 Reinflate clusters 2017/04/03 15:15:06 Split groups with paralogs 2017/04/03 15:15:07 Labelling the groups 2017/04/03 15:15:07 Transfering the annotation to the groups 2017/04/03 15:15:11 Creating accessory binary gene presence and absence fasta 2017/04/03 15:15:11 Creating accessory binary gene presence and absence tree 2017/04/03 15:15:11 The input file is too small so not creating a tree 2017/04/03 15:15:11 Creating accessory gene presence and absence clusters 2017/04/03 15:15:11 Theres no accessory binary file so skipping accessory binary clustering 2017/04/03 15:15:11 Creating the spreadsheet with gene presence and absence 2017/04/03 15:15:19 Creating summary statistics of the spreadsheet 2017/04/03 15:15:24 Creating tab files for R 2017/04/03 15:15:25 Create EMBL files 2017/04/03 15:15:26 Creating files with the nucleotide sequences for every cluster 2017/04/03 15:15:34 Cleaning up files Aligning each cluster Use of uninitialized value in require at (eval 2091) line 1. 2017/04/03 15:15:34 Running command: pan_genome_core_alignment -cd 99 2017/04/03 15:15:34 pan_genome_core_alignment -cd 99
--------------------- WARNING --------------------- MSG: Got a sequence without letters. Could not guess alphabet
--------------------- WARNING --------------------- MSG: Got a sequence without letters. Could not guess alphabet
--------------------- WARNING --------------------- MSG: Got a sequence without letters. Could not guess alphabet
Output of roary -a is
2017/04/03 15:07:41 Looking for 'Rscript' - found /usr/bin/Rscript 2017/04/03 15:07:41 Determined Rscript version is 3.0 2017/04/03 15:07:41 Looking for 'awk' - found /usr/bin/awk 2017/04/03 15:07:41 Looking for 'bedtools' - found /usr/bin/bedtools 2017/04/03 15:07:41 Determined bedtools version is 2.17 2017/04/03 15:07:41 Looking for 'blastp' - found /usr/bin/blastp 2017/04/03 15:07:41 Determined blastp version is 2.2.28 2017/04/03 15:07:41 Looking for 'grep' - found /bin/grep 2017/04/03 15:07:41 Optional tool 'kraken' not found in your $PATH 2017/04/03 15:07:41 Optional tool 'kraken-report' not found in your $PATH 2017/04/03 15:07:41 Looking for 'mafft' - found /usr/bin/mafft Use of uninitialized value in concatenation (.) or string at /usr/local/share/perl/5.18.2/Bio/Roary/External/CheckTools.pm line 129. 2017/04/03 15:07:42 Determined mafft version is 2017/04/03 15:07:42 Looking for 'makeblastdb' - found /usr/bin/makeblastdb 2017/04/03 15:07:42 Determined makeblastdb version is 2.2.28 2017/04/03 15:07:42 Looking for 'mcl' - found /usr/bin/mcl 2017/04/03 15:07:42 Determined mcl version is 12-135 2017/04/03 15:07:42 Looking for 'parallel' - found /usr/bin/parallel 2017/04/03 15:07:42 Determined parallel version is 20130922 2017/04/03 15:07:42 Looking for 'prank' - found /usr/bin/prank 2017/04/03 15:07:42 Looking for 'sed' - found /bin/sed 2017/04/03 15:07:42 Looking for 'cdhit' - found /usr/bin/cdhit 2017/04/03 15:07:42 Determined cdhit version is 4.6 2017/04/03 15:07:42 Looking for 'fasttree' - found /usr/bin/fasttree 2017/04/03 15:07:42 Determined fasttree version is 2.1 2017/04/03 15:07:42 Roary version 3.8.0 2017/04/03 15:07:42 Error: You need to provide at least 2 files to build a pan genome