Closed nikolay12 closed 8 years ago
Hi, Everything looks fine. Could you try rerunning it with -e -n as parameters, that should produce the core alignment
On 13 Oct 2016 17:56, "nikolay12" notifications@github.com wrote:
I just ran roary on a set of assemblies annotated by prokka. Got no error messages but the core_gene_alignment.aln was missing from the output.
Here is the output of roary -a:
$ roary -a Please cite Roary if you use any of the results it produces: Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill, "Roary: Rapid large-scale prokaryote pan genome analysis", Bioinformatics, 2015 Nov 15;31(22):3691-3693 doi: http://doi.org/10.1093/bioinformatics/btv421 Pubmed: 26198102 2016/10/13 12:45:43 Looking for 'Rscript' - found /usr/bin/Rscript 2016/10/13 12:45:43 Determined Rscript version is 3.2 2016/10/13 12:45:43 Looking for 'awk' - found /bin/awk 2016/10/13 12:45:43 Looking for 'bedtools' - found /app/Roary/build/bedtools2/bin/bedtools 2016/10/13 12:45:43 Determined bedtools version is 2.24 2016/10/13 12:45:43 Looking for 'blastp' - found /app/Roary/build/ncbi-blast-2.2.30+/bin/blastp 2016/10/13 12:45:43 Determined blastp version is 2.2.30 2016/10/13 12:45:43 Looking for 'grep' - found /bin/grep 2016/10/13 12:45:43 Optional tool 'kraken' not found in your $PATH 2016/10/13 12:45:43 Optional tool 'kraken-report' not found in your $PATH 2016/10/13 12:45:43 Looking for 'mafft' - found /app/Roary/build/mafft-7.271-without-extensions/build/bin/mafft 2016/10/13 12:45:43 Determined mafft version is 7.271 2016/10/13 12:45:43 Looking for 'makeblastdb' - found /app/Roary/build/ncbi-blast-2.2.30+/bin/makeblastdb 2016/10/13 12:45:43 Determined makeblastdb version is 2.2.30 2016/10/13 12:45:43 Looking for 'mcl' - found /app/Roary/build/mcl-14-137/ src/shmcl/mcl 2016/10/13 12:45:43 Determined mcl version is 14-137 2016/10/13 12:45:43 Looking for 'parallel' - found /app/Roary/build/parallel-20150522/src/parallel 2016/10/13 12:45:44 Determined parallel version is 20150522 2016/10/13 12:45:44 Looking for 'prank' - found /app/Roary/build/prank-msa- master/src/prank 2016/10/13 12:45:44 Looking for 'sed' - found /bin/sed 2016/10/13 12:45:44 Looking for 'cd-hit' - found /projects/jpc/london/software/cd-hit/v4.6.5/cd-hit 2016/10/13 12:45:44 Determined cd-hit version is 4.6 2016/10/13 12:45:44 Looking for 'FastTree' - found /app/Roary/build/fasttree/FastTree 2016/10/13 12:45:44 Determined FastTree version is 2.1 2016/10/13 12:45:44 Roary version 1.006924 2016/10/13 12:45:44 Error: You need to provide at least 2 files to build a pan genome Usage: roary [options] .gff Options: -p INT number of threads [1] -o STR clusters output filename [clustered_proteins] -f STR output directory [.] -e create a multiFASTA alignment of core genes using PRANK -n fast core gene alignment with MAFFT, use with -e -i minimum percentage identity for blastp [95] -cd FLOAT percentage of isolates a gene must be in to be core [99] -qc generate QC report with Kraken -k STR path to Kraken database for QC, use with -qc -a check dependancies and print versions -b STR blastp executable [blastp] -c STR mcl executable [mcl] -d STR mcxdeblast executable [mcxdeblast] -g INT maximum number of clusters [50000] -m STR makeblastdb executable [makeblastdb] -r create R plots, requires R and ggplot2 -s dont split paralogs -t INT translation table [11] -z dont delete intermediate files -v verbose output to STDOUT -w print version and exit -y add gene inference information to spreadsheet, doesnt work with -e -h this help message Example: Quickly generate a core gene alignment using 8 threads roary -e --mafft -p 8 .gff For further info see: http://sanger-pathogens.github.io/Roary/
Here is the output of the actual roary run:
drwxr-xr-x 4 nnikolo1 zusers 4096 Oct 13 10:42 .. -rw-rw-r-- 1 nnikolo1 zusers 53093 Oct 13 11:13 accessory_binary_genes.fa -rw-rw-r-- 1 nnikolo1 zusers 453 Oct 13 11:13 accessory_binary_genes.fa. newick -rw-rw-r-- 1 nnikolo1 zusers 545768 Oct 13 11:14 accessory_graph.dot -rw-rw-r-- 1 nnikolo1 zusers 1181741 Oct 13 11:14 accessory.header.embl -rw-rw-r-- 1 nnikolo1 zusers 1381819 Oct 13 11:14 accessory.tab -rw-rw-r-- 1 nnikolo1 zusers 48 Oct 13 11:13 blast_identity_frequency.Rtab -rw-rw-r-- 1 nnikolo1 zusers 409236 Oct 13 11:13 clustered_proteins -rw-rw-r-- 1 nnikolo1 zusers 664203 Oct 13 11:14 core_accessory_graph.dot -rw-rw-r-- 1 nnikolo1 zusers 1253693 Oct 13 11:14 core_accessory.header.embl -rw-rw-r-- 1 nnikolo1 zusers 1554420 Oct 13 11:14 core_accessory.tab -rw-rw-r-- 1 nnikolo1 zusers 1537923 Oct 13 11:14 gene_presence_absence.csv -rw-rw-r-- 1 nnikolo1 zusers 331951 Oct 13 11:14 gene_presence_absence.Rtab -rw-rw-r-- 1 nnikolo1 zusers 534 Oct 13 11:14 number_of_conserved_genes.Rtab -rw-rw-r-- 1 nnikolo1 zusers 650 Oct 13 11:14 number_of_genes_in_pan_genome. Rtab -rw-rw-r-- 1 nnikolo1 zusers 522 Oct 13 11:14 number_of_new_genes.Rtab -rw-rw-r-- 1 nnikolo1 zusers 650 Oct 13 11:14 number_of_unique_genes.Rtab -rw-rw-r-- 1 nnikolo1 zusers 201 Oct 13 11:14 summary_statistics.txt
Here is the log:
2016/10/13 11:13:55 Creating accessory binary gene presence and absence tree 2016/10/13 11:13:55 Running command: /app/Roary/build/fasttree/FastTree -fastest -nt accessory_binary_genes.fa > accessory_binary_genes.fa.newick FastTree Version 2.1.8 SSE3 Alignment: accessory_binary_genes.fa Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000 Search: Fastest+2nd +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1 TopHits: 1.00*sqrtN close=default refresh=0.50 ML Model: Jukes-Cantor, CAT approximation with 20 rate categories Initial topology in 0.03 seconds Refining topology: 15 rounds ME-NNIs, 2 rounds ME-SPRs, 7 rounds ML-NNIs 0.26 seconds: ME NNI round 6 of 15, 1 of 11 splits ^M 0.47 seconds: ME NNI round 11 of 15, 1 of 11 splits ^MTotal branch-length 1.997 after 0.50 sec 0.60 seconds: ML NNI round 1 of 7, 1 of 11 splits ^MML-NNI round 1: LogLk = -29742.679 NNIs 2 max delta 38.57 Time 0.89 0.89 seconds: Site likelihoods with rate category 1 of 20 ^MSwitched to using 20 rate categories (CAT approximation) Rate categories were divided by 0.751 so that average rate = 1.0 CAT-based log-likelihoods may not be comparable across runs Use -gamma for approximate but comparable Gamma(20) log-likelihoods 0.99 seconds: ML NNI round 2 of 7, 1 of 11 splits ^MML-NNI round 2: LogLk = -29155.713 NNIs 0 max delta 0.00 Time 1.10 Turning off heuristics for final round of ML NNIs (converged) 1.10 seconds: ML NNI round 3 of 7, 1 of 11 splits ^MML-NNI round 3: LogLk = -29151.158 NNIs 0 max delta 0.00 Time 1.39 (final) 1.38 seconds: ML Lengths 1 of 11 splits ^MOptimize all lengths: LogLk = -29151.146 Time 1.48 Total time: 2.32 seconds Unique: 13/13 Bad splits: 0/10
What to do now? I would like to produce an output that I can use with raxml.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sanger-pathogens/Roary/issues/284, or mute the thread https://github.com/notifications/unsubscribe-auth/AABeVzsD4KUtlssl3Pgpss1e8ehiZ6Pmks5qzmK1gaJpZM4KWH5R .
Thanks. That was it - I ran roary without the "-e" and "-n" flags. When I re-ran it with these flags the aln file was generated.
Hi, Everything looks fine. Could you try rerunning it with -e -n as parameters, that should produce the core alignment
On 13 Oct 2016 17:56, "nikolay12" notifications@github.com wrote:
I just ran roary on a set of assemblies annotated by prokka. Got no error messages but the core_gene_alignment.aln was missing from the output.
Here is the output of roary -a:
$ roary -a Please cite Roary if you use any of the results it produces: Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill, "Roary: Rapid large-scale prokaryote pan genome analysis", Bioinformatics, 2015 Nov 15;31(22):3691-3693 doi: http://doi.org/10.1093/bioinformatics/btv421 Pubmed: 26198102 2016/10/13 12:45:43 Looking for 'Rscript' - found /usr/bin/Rscript 2016/10/13 12:45:43 Determined Rscript version is 3.2 2016/10/13 12:45:43 Looking for 'awk' - found /bin/awk 2016/10/13 12:45:43 Looking for 'bedtools' - found /app/Roary/build/bedtools2/bin/bedtools 2016/10/13 12:45:43 Determined bedtools version is 2.24 2016/10/13 12:45:43 Looking for 'blastp' - found /app/Roary/build/ncbi-blast-2.2.30+/bin/blastp 2016/10/13 12:45:43 Determined blastp version is 2.2.30 2016/10/13 12:45:43 Looking for 'grep' - found /bin/grep 2016/10/13 12:45:43 Optional tool 'kraken' not found in your $PATH 2016/10/13 12:45:43 Optional tool 'kraken-report' not found in your $PATH 2016/10/13 12:45:43 Looking for 'mafft' - found /app/Roary/build/mafft-7.271-without-extensions/build/bin/mafft 2016/10/13 12:45:43 Determined mafft version is 7.271 2016/10/13 12:45:43 Looking for 'makeblastdb' - found /app/Roary/build/ncbi-blast-2.2.30+/bin/makeblastdb 2016/10/13 12:45:43 Determined makeblastdb version is 2.2.30 2016/10/13 12:45:43 Looking for 'mcl' - found /app/Roary/build/mcl-14-137/ src/shmcl/mcl 2016/10/13 12:45:43 Determined mcl version is 14-137 2016/10/13 12:45:43 Looking for 'parallel' - found /app/Roary/build/parallel-20150522/src/parallel 2016/10/13 12:45:44 Determined parallel version is 20150522 2016/10/13 12:45:44 Looking for 'prank' - found /app/Roary/build/prank-msa- master/src/prank 2016/10/13 12:45:44 Looking for 'sed' - found /bin/sed 2016/10/13 12:45:44 Looking for 'cd-hit' - found /projects/jpc/london/software/cd-hit/v4.6.5/cd-hit 2016/10/13 12:45:44 Determined cd-hit version is 4.6 2016/10/13 12:45:44 Looking for 'FastTree' - found /app/Roary/build/fasttree/FastTree 2016/10/13 12:45:44 Determined FastTree version is 2.1 2016/10/13 12:45:44 Roary version 1.006924 2016/10/13 12:45:44 Error: You need to provide at least 2 files to build a pan genome Usage: roary [options] .gff Options: -p INT number of threads [1] -o STR clusters output filename [clustered_proteins] -f STR output directory [.] -e create a multiFASTA alignment of core genes using PRANK -n fast core gene alignment with MAFFT, use with -e -i minimum percentage identity for blastp [95] -cd FLOAT percentage of isolates a gene must be in to be core [99] -qc generate QC report with Kraken -k STR path to Kraken database for QC, use with -qc -a check dependancies and print versions -b STR blastp executable [blastp] -c STR mcl executable [mcl] -d STR mcxdeblast executable [mcxdeblast] -g INT maximum number of clusters [50000] -m STR makeblastdb executable [makeblastdb] -r create R plots, requires R and ggplot2 -s dont split paralogs -t INT translation table [11] -z dont delete intermediate files -v verbose output to STDOUT -w print version and exit -y add gene inference information to spreadsheet, doesnt work with -e -h this help message Example: Quickly generate a core gene alignment using 8 threads roary -e --mafft -p 8 .gff For further info see: http://sanger-pathogens.github.io/Roary/
Here is the output of the actual roary run:
drwxr-xr-x 4 nnikolo1 zusers 4096 Oct 13 10:42 .. -rw-rw-r-- 1 nnikolo1 zusers 53093 Oct 13 11:13 accessory_binary_genes.fa -rw-rw-r-- 1 nnikolo1 zusers 453 Oct 13 11:13 accessory_binary_genes.fa. newick -rw-rw-r-- 1 nnikolo1 zusers 545768 Oct 13 11:14 accessory_graph.dot -rw-rw-r-- 1 nnikolo1 zusers 1181741 Oct 13 11:14 accessory.header.embl -rw-rw-r-- 1 nnikolo1 zusers 1381819 Oct 13 11:14 accessory.tab -rw-rw-r-- 1 nnikolo1 zusers 48 Oct 13 11:13 blast_identity_frequency.Rtab -rw-rw-r-- 1 nnikolo1 zusers 409236 Oct 13 11:13 clustered_proteins -rw-rw-r-- 1 nnikolo1 zusers 664203 Oct 13 11:14 core_accessory_graph.dot -rw-rw-r-- 1 nnikolo1 zusers 1253693 Oct 13 11:14 core_accessory.header.embl -rw-rw-r-- 1 nnikolo1 zusers 1554420 Oct 13 11:14 core_accessory.tab -rw-rw-r-- 1 nnikolo1 zusers 1537923 Oct 13 11:14 gene_presence_absence.csv -rw-rw-r-- 1 nnikolo1 zusers 331951 Oct 13 11:14 gene_presence_absence.Rtab -rw-rw-r-- 1 nnikolo1 zusers 534 Oct 13 11:14 number_of_conserved_genes.Rtab -rw-rw-r-- 1 nnikolo1 zusers 650 Oct 13 11:14 number_of_genes_in_pan_genome. Rtab -rw-rw-r-- 1 nnikolo1 zusers 522 Oct 13 11:14 number_of_new_genes.Rtab -rw-rw-r-- 1 nnikolo1 zusers 650 Oct 13 11:14 number_of_unique_genes.Rtab -rw-rw-r-- 1 nnikolo1 zusers 201 Oct 13 11:14 summary_statistics.txt
Here is the log:
2016/10/13 11:13:55 Creating accessory binary gene presence and absence tree 2016/10/13 11:13:55 Running command: /app/Roary/build/fasttree/FastTree -fastest -nt accessory_binary_genes.fa > accessory_binary_genes.fa.newick FastTree Version 2.1.8 SSE3 Alignment: accessory_binary_genes.fa Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000 Search: Fastest+2nd +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1 TopHits: 1.00*sqrtN close=default refresh=0.50 ML Model: Jukes-Cantor, CAT approximation with 20 rate categories Initial topology in 0.03 seconds Refining topology: 15 rounds ME-NNIs, 2 rounds ME-SPRs, 7 rounds ML-NNIs 0.26 seconds: ME NNI round 6 of 15, 1 of 11 splits ^M 0.47 seconds: ME NNI round 11 of 15, 1 of 11 splits ^MTotal branch-length 1.997 after 0.50 sec 0.60 seconds: ML NNI round 1 of 7, 1 of 11 splits ^MML-NNI round 1: LogLk = -29742.679 NNIs 2 max delta 38.57 Time 0.89 0.89 seconds: Site likelihoods with rate category 1 of 20 ^MSwitched to using 20 rate categories (CAT approximation) Rate categories were divided by 0.751 so that average rate = 1.0 CAT-based log-likelihoods may not be comparable across runs Use -gamma for approximate but comparable Gamma(20) log-likelihoods 0.99 seconds: ML NNI round 2 of 7, 1 of 11 splits ^MML-NNI round 2: LogLk = -29155.713 NNIs 0 max delta 0.00 Time 1.10 Turning off heuristics for final round of ML NNIs (converged) 1.10 seconds: ML NNI round 3 of 7, 1 of 11 splits ^MML-NNI round 3: LogLk = -29151.158 NNIs 0 max delta 0.00 Time 1.39 (final) 1.38 seconds: ML Lengths 1 of 11 splits ^MOptimize all lengths: LogLk = -29151.146 Time 1.48 Total time: 2.32 seconds Unique: 13/13 Bad splits: 0/10
What to do now? I would like to produce an output that I can use with raxml.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub #284, or mute the thread https://github.com/notifications/unsubscribe-auth/AABeVzsD4KUtlssl3Pgpss1e8ehiZ6Pmks5qzmK1gaJpZM4KWH5R .
Hi, I have run roary with -e -n flags but still didn't get the core_gene_alignment.aln file. The number of my input genomes is about 3,500, is that probably the reason why the file was missing? The files in the output dir as follows: accessory_binary_genes.fa accessory_binary_genes.fa.newick _accessory_clusters _accessory_clusters.clstr accessory_graph.dot accessory.header.embl accessory.tab Ah_2bXxwwK b33Q5wBpOQ bBsogkRUyU bDWuGOIHN0 blast_identity_frequency.Rtab _blast_results _clustered _clustered.clstr clustered_proteins _combined_files _combined_files.groups core_accessory_graph.dot core_accessory.header.embl core_accessory.tab gene_presence_absence.csv gene_presence_absence.Rtab _inflated_mcl_groups _inflated_unsplit_mcl_groups JidzILfCGx _labeled_mcl_groups number_of_conserved_genes.Rtab number_of_genes_in_pan_genome.Rtab number_of_new_genes.Rtab number_of_unique_genes.Rtab pan_genome_reference.fa pan_genome_sequences QtL893WRo2 sP7T9F1EOg summary_statistics.txt _uninflated_mcl_groups
I just ran roary on a set of assemblies annotated by prokka. Got no error messages but the core_gene_alignment.aln was missing from the output.
Here is the output of roary -a:
Here is the output of the actual roary run:
Here is the log:
What to do now? I would like to produce an output that I can use with raxml.