Closed dutchscientist closed 8 years ago
I have after 3.6.4 installed 3.6.5 specifically (instead of the standard route), and it again drops to no core genes. Hence it is specifically something in 3.6.5.
If you want specific logging, let me know :-)
I've just uploaded a fix for this, thanks for reporting it. Version 3.6.6 should be in CPAN in a few hours.
Unfortunately the update to 3.6.6 has not resolved the problem, I still get the 0 core, 0 soft core output. All dependencies are up to date.
Happy to make a ZIP with the files for you?
This is the summary.txt: Core genes (99% <= strains <= 100%) 0 Soft core genes (95% <= strains < 99%) 0 Shell genes (15% <= strains < 95%) 1678 Cloud genes (0% <= strains < 15%) 1138 Total genes (0% <= strains <= 100%) 2816
This is the verbose output: arnoud@T130[roary] roary -v -i 80 -s *.gff [ 1:21AM]
Please cite Roary if you use any of the results it produces: Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill, "Roary: Rapid large-scale prokaryote pan genome analysis", Bioinformatics, 2015 Nov 15;31(22):3691-3693 doi: http://doi.org/10.1093/bioinformatics/btv421 Pubmed: 26198102
2016/07/26 01:21:53 Fixing input GFF files
2016/07/26 01:22:10 Extracting proteins from GFF files
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12895.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12896.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12897.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12903.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12904.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12905.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12910.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12912.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12913.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12918.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12921.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12926.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12927.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12928.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12929.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12934.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12935.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12936.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12937.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12942.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12943.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12944.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12945.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12950.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12951.gff
Extracting proteins from /home/arnoud/data/roary/CL1_AGR_LDI12952.gff
Extracting proteins from /home/arnoud/data/roary/CL2_HUM_LDI9893.gff
Extracting proteins from /home/arnoud/data/roary/CL2_HUM_LDI9898.gff
Extracting proteins from /home/arnoud/data/roary/CL2_WBI_LDI12965.gff
Extracting proteins from /home/arnoud/data/roary/CL2_WBI_LDI4911.gff
Extracting proteins from /home/arnoud/data/roary/CL2_WBI_LDI6751.gff
Extracting proteins from /home/arnoud/data/roary/CL2_WBI_LDI6782.gff
Extracting proteins from /home/arnoud/data/roary/CL2_WBI_LDI6791.gff
Extracting proteins from /home/arnoud/data/roary/CL2_WBI_LDI9152.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9149.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9163.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9195.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9196.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9198.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9203.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9205.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9868.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9871.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9876.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9879.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9882.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9888.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9901.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9909.gff
Extracting proteins from /home/arnoud/data/roary/CL3_ENV_LDI9921.gff
Extracting proteins from /home/arnoud/data/roary/CL4_WBI_LDI12894.gff
Extracting proteins from /home/arnoud/data/roary/CL4_WBI_LDI12979.gff
Extracting proteins from /home/arnoud/data/roary/CL4_WBI_LDI6735.gff
Extracting proteins from /home/arnoud/data/roary/CL4_WBI_LDI6743.gff
Extracting proteins from /home/arnoud/data/roary/CL4_WBI_LDI6745.gff
Extracting proteins from /home/arnoud/data/roary/CL4_WBI_LDI6759.gff
Extracting proteins from /home/arnoud/data/roary/CL4_WBI_LDI6781.gff
Extracting proteins from /home/arnoud/data/roary/CL4_WBI_LDI6783.gff
Extracting proteins from /home/arnoud/data/roary/CL4_WBI_LDI9153.gff
Extracting proteins from /home/arnoud/data/roary/CL4_WBI_LDI9160.gff
Extracting proteins from /home/arnoud/data/roary/CL4_WBI_LDI9161.gff
Extracting proteins from /home/arnoud/data/roary/CL4_WBI_LDI9169.gff
Combine proteins into a single file
Iteratively run cd-hit
Parallel all against all blast
Cluster with MCL
2016/07/26 01:28:17 Running command: pan_genome_post_analysis -o clustered_proteins -p pan_genome.fa -s gene_presence_absence.csv -c _clustered.clstr -i /home/arnoud/data/roary/ev2984x4_I//_gff_files -f /home/arnoud/data/roary/ev2984x4_I//_fasta_files -t 11 --dont_create_rplots --dont_split_groups -v -j Local --processors 1 --group_limit 50000 -cd 99
2016/07/26 01:28:17 Reinflate clusters
2016/07/26 01:28:17 Split groups with paralogs
2016/07/26 01:28:17 Labelling the groups
2016/07/26 01:28:17 Transfering the annotation to the groups
2016/07/26 01:28:36 Creating accessory binary gene presence and absence fasta
2016/07/26 01:28:37 Creating accessory binary gene presence and absence tree
2016/07/26 01:28:37 Running command: /usr/bin/fasttree -fastest -nt accessory_binary_genes.fa > accessory_binary_genes.fa.newick
FastTree Version 2.1.7 SSE3
Alignment: accessory_binary_genes.fa
Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000
Search: Fastest+2nd +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1
TopHits: 1.00*sqrtN close=default refresh=0.50
ML Model: Jukes-Cantor, CAT approximation with 20 rate categories
Initial topology in 0.04 seconds
Refining topology: 24 rounds ME-NNIs, 2 rounds ME-SPRs, 12 rounds ML-NNIs
Total branch-length 9.218 after 0.76 sec, 101 of 122 nodes
ML-NNI round 1: LogLk = -73659.596 NNIs 13 max delta 190.05 Time 1.12
Switched to using 20 rate categories (CAT approximation)1 of 20
Rate categories were divided by 0.947 so that average rate = 1.0
CAT-based log-likelihoods may not be comparable across runs
Use -gamma for approximate but comparable Gamma(20) log-likelihoods
ML-NNI round 2: LogLk = -71584.459 NNIs 3 max delta 16.21 Time 1.36
ML-NNI round 3: LogLk = -71582.788 NNIs 0 max delta 0.00 Time 1.42
Turning off heuristics for final round of ML NNIs (converged)
ML-NNI round 4: LogLk = -71543.812 NNIs 1 max delta 26.06 Time 1.69 (final)
Optimize all lengths: LogLk = -71519.255 Time 1.77
Total time: 2.32 seconds Unique: 62/62 Bad splits: 1/59 Worst delta-LogLk 0.11
This is the output with 3.6.4:
Core genes (99% <= strains <= 100%) 1350 Soft core genes (95% <= strains < 99%) 20 Shell genes (15% <= strains < 95%) 466 Cloud genes (0% <= strains < 15%) 1353 Total genes (0% <= strains <= 100%) 3189
arnoud@T130[roary] roary -v -i 80 -s *.gff [ 1:41AM]
Please cite Roary if you use any of the results it produces: Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill, "Roary: Rapid large-scale prokaryote pan genome analysis", Bioinformatics, 2015 Nov 15;31(22):3691-3693 doi: http://doi.org/10.1093/bioinformatics/btv421 Pubmed: 26198102
2016/07/26 01:46:08 Fixing input GFF files 2016/07/26 01:46:25 Extracting proteins from GFF files
Sorry about that, yes a zip would be really useful. Andrew
On 26 July 2016 at 02:03, dutchscientist notifications@github.com wrote:
This is the output with 3.6.4:
Core genes (99% <= strains <= 100%) 1350 Soft core genes (95% <= strains < 99%) 20 Shell genes (15% <= strains < 95%) 466 Cloud genes (0% <= strains < 15%) 1353 Total genes (0% <= strains <= 100%) 3189
arnoud@T130[roary] roary -v -i 80 -s *.gff [ 1:41AM]
Please cite Roary if you use any of the results it produces: Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill, "Roary: Rapid large-scale prokaryote pan genome analysis", Bioinformatics, 2015 Nov 15;31(22):3691-3693 doi: http://doi.org/10.1093/bioinformatics/btv421 Pubmed: 26198102
2016/07/26 01:46:08 Fixing input GFF files 2016/07/26 01:46:25 Extracting proteins from GFF files
Combine proteins into a single file Iteratively run cd-hit Parallel all against all blast Cluster with MCL 2016/07/26 01:56:56 Running command: pan_genome_post_analysis -o clustered_proteins -p pan_genome.fa -s gene_presence_absence.csv -c _clustered.clstr -i /home/arnoud/data/roary/yxviRwP7O8//_gff_files -f /home/arnoud/data/roary/yxviRwP7O8//_fasta_files -t 11 --dont_create_rplots --dont_split_groups -v -j Local --processors 1 --group_limit 50000 -cd 99 2016/07/26 01:56:56 Reinflate clusters 2016/07/26 01:56:57 Split groups with paralogs 2016/07/26 01:56:57 Labelling the groups 2016/07/26 01:56:57 Transfering the annotation to the groups 2016/07/26 01:57:15 Creating accessory binary gene presence and absence fasta 2016/07/26 01:57:16 Creating accessory binary gene presence and absence tree 2016/07/26 01:57:16 Running command: /usr/bin/fasttree -fastest -nt accessory_binary_genes.fa > accessory_binary_genes.fa.newick FastTree Version 2.1.7 SSE3 Alignment: accessory_binary_genes.fa Nucleotide distances: Jukes-Cantor Joins: balanced Support: SH-like 1000 Search: Fastest+2nd +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1 TopHits: 1.00*sqrtN close=default refresh=0.50 ML Model: Jukes-Cantor, CAT approximation with 20 rate categories Initial topology in 0.02 seconds Refining topology: 23 rounds ME-NNIs, 2 rounds ME-SPRs, 12 rounds ML-NNIs Total branch-length 3.557 after 0.25 sec, 101 of 112 nodes
ML-NNI round 1: LogLk = -13812.193 NNIs 15 max delta 20.13 Time 0.37 Switched to using 20 rate categories (CAT approximation)1 of 20
Rate categories were divided by 0.853 so that average rate = 1.0 CAT-based log-likelihoods may not be comparable across runs Use -gamma for approximate but comparable Gamma(20) log-likelihoods ML-NNI round 2: LogLk = -13295.617 NNIs 7 max delta 16.76 Time 0.46 ML-NNI round 3: LogLk = -13294.223 NNIs 1 max delta 0.00 Time 0.51 Turning off heuristics for final round of ML NNIs (converged) ML-NNI round 4: LogLk = -13287.413 NNIs 2 max delta 2.06 Time 0.60 (final) Optimize all lengths: LogLk = -13287.028 Time 0.62 Total time: 0.81 seconds Unique: 57/62 Bad splits: 0/54
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/sanger-pathogens/Roary/issues/263#issuecomment-235132827, or mute the thread https://github.com/notifications/unsubscribe-auth/AABeV442u7Ho6W6Fjz7viVKi4ekDTIovks5qZVztgaJpZM4JSzSj .
The GFF files (from Prokka): https://drive.google.com/open?id=0B6RiTqKBNQg6Zm5UdE93OEdrOVk
The Roary 3.6.4 output: https://drive.google.com/open?id=0B6RiTqKBNQg6ODNtdUEzc2hBVlk
The Roary 3.6.6 output: https://drive.google.com/open?id=0B6RiTqKBNQg6ZG5KV0U0SG8yZVE
Runs on Biolinux8, all dependencies up to date.
Thanks a million
On 26 July 2016 at 10:15, dutchscientist notifications@github.com wrote:
The GFF files (from Prokka): https://drive.google.com/open?id=0B6RiTqKBNQg6Zm5UdE93OEdrOVk
The Roary 3.6.4 output: https://drive.google.com/open?id=0B6RiTqKBNQg6ODNtdUEzc2hBVlk
The Roary 3.6.6 output: https://drive.google.com/open?id=0B6RiTqKBNQg6ZG5KV0U0SG8yZVE
Runs on Biolinux8, all dependencies up to date.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/sanger-pathogens/Roary/issues/263#issuecomment-235208861, or mute the thread https://github.com/notifications/unsubscribe-auth/AABeV-Q58P0l9c9cqYMQXkjKParHBm73ks5qZdAegaJpZM4JSzSj .
Thanks for the files, its allowed me to track down the underlying issue. I've added tests which replicated the bug, fixed it and deployed a new version (v3.6.7).
Great, will give it a go!
On 26 Jul 2016 4:33 p.m., "andrewjpage" notifications@github.com wrote:
Thanks for the files, its allowed me to track down the underlying issue. I've added tests which replicated the bug, fixed it and deployed a new version (v3.6.7).
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/sanger-pathogens/Roary/issues/263#issuecomment-235306405, or mute the thread https://github.com/notifications/unsubscribe-auth/AJ8e0M74bgDSpDwfu_g0Y9xZJ90xC_F-ks5qZii_gaJpZM4JSzSj .
Cool, 3.6.7 gives the "correct" outcome. Will test it with some other datasets soon.
Out of curiosity, what was wrong?
Excellent, I owe you a pint for putting up with my bugs! It was only reading genes from every second contig because I incorrectly used sed. Andrew
On 26 Jul 2016 17:23, "dutchscientist" notifications@github.com wrote:
Cool, 3.6.7 gives the "correct" outcome. Will test it with some other datasets soon.
Out of curiosity, what was wrong?
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/sanger-pathogens/Roary/issues/263#issuecomment-235322823, or mute the thread https://github.com/notifications/unsubscribe-auth/AABeV6aEbksj0bJwrB6_6qZ3Geyaz68qks5qZjR2gaJpZM4JSzSj .
Glad I can help, Roary has enabled my bioinformatics skills to no end! So the pints are on me ;-)
Sent from my BlackBerry 10 smartphone on the EE network. From: andrewjpage Sent: Tuesday, 26 July 2016 18:28 To: sanger-pathogens/Roary Reply To: sanger-pathogens/Roary Cc: dutchscientist; Author Subject: Re: [sanger-pathogens/Roary] Roary 3.6.5 giving different (erroneous) results compared to 3.5.7 and 3.6.1/3.6.3/3.6.4 (#263)
Excellent, I owe you a pint for putting up with my bugs! It was only reading genes from every second contig because I incorrectly used sed. Andrew
On 26 Jul 2016 17:23, "dutchscientist" notifications@github.com wrote:
Cool, 3.6.7 gives the "correct" outcome. Will test it with some other datasets soon.
Out of curiosity, what was wrong?
You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/sanger-pathogens/Roary/issues/263#issuecomment-235322823, or mute the thread https://github.com/notifications/unsubscribe-auth/AABeV6aEbksj0bJwrB6_6qZ3Geyaz68qks5qZjR2gaJpZM4JSzSj .
You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/sanger-pathogens/Roary/issues/263#issuecomment-235342481, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJ8e0G0LxsaG931eu6mUzhdVN-kc_Kxvks5qZkOsgaJpZM4JSzSj.
I have just installed a new Biolinux8 workstation and included Roary (of course). It installed v. 3.6.5, and when I used it with a known Campylobacter coli test set, it claimed that there were 0 core, 0 soft core genes, and dumped everything into shell, cloud etc. Even with -i 70 and -s switches.
I previously used v.3.5.7 and that gave me >1,000 core genes with the same set, and when I downgraded to earlier versions on CPAN (3.6.3 and 3.6.4) and an older one I had on a virtual machine (3.6.1), it also gave me the earlier result of 1,350 core genes. Something has been changed in 3.6.5 that could cause this difference/error?