Closed djbradshaw2 closed 1 month ago
Thanks! I would recommend using --no-cleanup
to keep the alignment files at each iteration, and inserting a print statement at this line to output the tree_building_command
variable for each iteration - this should allow you to reproduce each step.
Hi Dr. Croucher,
Thanks for helping me with this as well. Thankfully I am running gubbins on slurm and keeping the stdout, so I have the command (I think) for the fifth iteration which is the only one I would like to reconstruct as a core SNP version. From that I have the following commands (edited to remove paths before file name for brevity).
Iteration 5
Constructing the phylogenetic tree with raxmlHPC-PTHREADS-AVX2...
raxmlHPC-PTHREADS-AVX2 -T 32 -p 3695 -safe -m GTRGAMMA -f d -p 1 -s SX519_Chromosomal_Ref_clean.core.full.iteration_4.tre.phylip -n SX519_Chromosomal_Ref_clean.core.full.iteration_5 -t SX519_Chromosomal_Ref_clean.core.full.iteration_4.tre
...done. Run time: 865947.10 s
Reconstructing ancestral sequences with pyjar...
Fitting substitution model to tree...
raxmlHPC-PTHREADS-AVX2 -T 32 -p 9260 -safe -m GTRGAMMA -s SX519_Chromosomal_Ref_clean.core.full.aln.snp_sites.aln -n SX519_Chromosomal_Ref_clean.core.full.iteration_5_reconstruction -t /tmp38ovm7mx/SX519_Chromosomal_Ref_clean.core.full.iteration_5.tre.rooted -f e -w /tmp38ovm7mx
Running joint ancestral reconstruction with pyjar Reading tree file: /tmp38ovm7mx/RAxML_result.SX519_Chromosomal_Ref_clean.core.full.iteration_5_reconstruction Reading info file: /tmp38ovm7mx/RAxML_info.SX519_Chromosomal_Ref_clean.core.full.iteration_5_reconstruction Frequencies: 0.187771, 0.314138, 0.309745, 0.188346 Rates: 1.101601, 4.490749, 1.306992, 0.26046, 4.211125, 1.0 Reconstructing sites on tree Printing alignment with internal node sequences: /tmp38ovm7mx/SX519_Chromosomal_Ref_clean.core.full.iteration_5.internal.joint.aln Printing tree with internal nodes labelled: /tmp38ovm7mx/SX519_Chromosomal_Ref_clean.core.full.iteration_5.internal.joint.tre Done Time for JAR preparation: 44.34410039099748 Time for JAR calculation: 23831.071987628995
Transferring pyjar results onto original recombination-corrected tree
Done transfer ...done. Run time: 907038.52 s
Running Gubbins to detect recombinations...
gubbins -r -v SX519_Chromosomal_Ref_clean.core.full.aln.gaps.vcf -a 100 -b 10000 -f /tmp38ovm7mx/SX519_Chromosomal_Ref_clean.core.full.aln -t SX519_Chromosomal_Ref_clean.core.full.iteration_5.tre -m 3 -p 0.05 -i 1.0 /tmp38ovm7mx/SX519_Chromosomal_Ref_clean.core.full.iteration_5.internal.joint.aln
...done. Run time: 909932.03 s
Checking for convergence... ...done. Run time: 909935.27 s Maximum number of iterations (5) reached.
Exiting the main loop.
I am assuming that the file whose name is changed to .final_tree.tre is created from the first command? Does that mean that the .filtered_polymorphic_sites.phylip is the SX519_Chromosomal_Ref_clean.core.full.iteration_4.tre.phylip file used in that command? Also, am I correct in that the pyjar command after that seems to be creating the file that eventually becomes the *.node_labelled.final_tree.tre final output?
Sorry if I am grasping at straws here, I would honestly like to try to recreate the final tree as a core SNP version without having to rerun command with --no-cleanup but understand if that is the only way to do it.
Thanks for your time and help.
Sincerely,
David
Yes, this is the command that generates the final tree (using the recombination-filtered alignment from the previous iteration):
raxmlHPC-PTHREADS-AVX2 -T 32 -p 3695 -safe -m GTRGAMMA -f d -p 1 -s SX519_Chromosomal_Ref_clean.core.full.iteration_4.tre.phylip -n SX519_Chromosomal_Ref_clean.core.full.iteration_5 -t SX519_Chromosomal_Ref_clean.core.full.iteration_4.tre
The pyjar
reconstruction is just for inferring recombinations, not for tree construction.
Hi Dr. Croucher,
Thank you very much, for the clarification/confirmation!! That makes sense.
I realized I had not replied to this, I am very sorry! I just ended up rerunning with --no-cleanup without the isolates that gubbins removed in the first run of this process to make things less complicated. Then I am using IQTree to recreate a core tree with the snpsites -C output from input snippy alignment to define the invariant sites. So, I am closing this issue (although if I am doing this wrong, please let me know).
snippy-clean_full_aln snippy.full.aln > snippy_clean.full.aln
run_gubbins.py --first-tree-builder rapidnj --first-model JC -p gubbins -c 32 -v snippy_clean.full.aln --no-cleanup
snp-sites -c gubbins.filtered_polymorphic_sites.fasta > gubbins_clean.core.aln
FastTree -gtr -nt gubbins_clean.core.aln > fastree_gubbins_clean.core.tree
source /project/fsepru113/dbradshaw/miniconda3/bin/activate snippy snp-sites -C snippy_clean.full.aln > snippy_invariant_sites.txt
constsites=$(cat snippy_invariant_sites.txt)
iqtree -s gubbins_clean.core.aln --prefix gubbins_snippy_core_snp_iqtree_tree -fconst $constsites -m MFP -bb 1000 -nt AUTO
Thank you very much for your time and help.
Sincerely,
David
Dear Gubbins Creators,
Thanks again for a great tool! Sorry to open up so many questions. This is a multiple tools question, so please let me know if anything is out of gubbins scope.
It is my understanding that the .final_tree.tre is made using a recombination masked version of the .filtered_polymorphic_sites.phylip file per #292 and #380. Gubbins Question: Is that correct?
I wanted to create a core SNP version of the phylogenetic tree using RaxML and IQtree to compare with the full non-recombinant SNPs *.fina_tree.tre which was created with RaxML (From gubbins.log - Tree constructor (later iterations),raxmlHPC-PTHREADS-AVX2,8.2.12,GTRGAMMA).
Multiple Tools Question: To do this should I do the following? I was ultimately unsure where to get the constant site counts from.
Extract the core SNPs alignment (per Snippy GitHub)
snp-sites -c gubbins.filtered_polymorphic_sites.fasta > SX519_Chromosomal_Ref_clean.core.aln
Recreate the masked version of the alignment used to make *.final_tree.tre Gubbins Question: Gubbins removed two isolates from original snippy input due to missing data percentages > 25% increasing the number of SNPs, will using this gff overcome that despite using the original snippy input alignment?
mask_gubbins_aln.py --aln SX519_Chromosomal_Ref_clean.core.full.aln --gff gubbins.recombination_predictions.gff --out gubbins.masked.aln
Get the constant site counts to be used to make the new core trees
snp-sites -C gubbins.masked.aln
1123980,1204859,1205431,1129769Make the core SNP trees with output core alignment from step 1 and constant sites numbers from step 3.
iqtree -fconst 1123980,1204859,1205431,1129769 -s SX519_Chromosomal_Ref_clean.core.aln
raxml - script to be determined but using https://github.com/tseemann/snippy/issues/362 as guidanceThanks for your time and help. Please let me know if you need any other information or have any questions.
Sincerely,
David
gubbins version: 3.3.0 raxml version: 8.2.12 iqtree version: 2.2.3 snippy version: 4.6.0 snp-sites version: 2.5.1
Gubbins Output Creation Scripts:
snippy-clean_full_aln core.full.aln > SX519_Chromosomal_Ref_clean.core.full.aln
run_gubbins.py --first-tree-builder rapidnj --first-model JC -p gubbins -c 32 -v SX519_Chromosomal_Ref_clean.core.full.aln