Open lskatz opened 3 years ago
I also tried to open it in Figtree which worked. Then saving it as nexus. Then converting it back to newick. This newick file also did not work in seq-gen.
Tried a different tree file just to be sure the issue is with the tree and I think it is
[gzu2@monolith3 nextstrain-2020-01-04]$ echo '(A:0.1,B:0.1,C:0.1);' > tmp.nwk
[gzu2@monolith3 nextstrain-2020-01-04]$ seq-gen -l768000 -n1 -mGTR -a5.0 -r0.25,0.82,0.15,0.27,2.99,1.00 -f0.299236590102,0.183687135874,0.196176253934,0.32090002009 -or < tmp.nwk | goalign stats --auto-detect
Sequence Generator - seq-gen
Version 1.3.4
(c) Copyright, 1996-2017 Andrew Rambaut and Nick Grassly
Institute of Evolutionary Biology, University of Edinburgh
Originally developed at:
Department of Zoology, University of Oxford
Random number generator seed: -1600840904040947534
Simulations of 3 taxa, 768000 nucleotides
for 1 tree(s) with 1 dataset(s) per tree
Branch lengths assumed to be number of substitutions per site
Continuous gamma rate heterogeneity:
shape = 5.000000
Model = GTR: General time reversible (nucleotides)
Rate of transitions and transversions equal:
rate matrix = gamma1: 0.2500 alpha1: 0.8200 beta1: 0.1500
beta2: 0.2700 alpha2: 2.9900
gamma2: 1.0000
with nucleotide frequencies specified as:
A=0.299237 C=0.183687 G=0.196176 T=0.3209
Time taken: 0.35 seconds
length 768000
nseqs 3
avgalleles 1.2478
variable sites 181432
char nb freq
A 688699 0.298914
C 423453 0.183790
G 452789 0.196523
T 739059 0.320772
alphabet nucleotide
seq-gen does not like some variations in Newick! This fixed my tree. I think it was some combination that seq-gen needed to be fixed:
$tree->force_binary
$tree->contract_linear_paths
$node->branch_length || $node->branch_length(rand(1e-7))
. I am still not clear on how much of an issue this was because in some example trees that I tried, I could successfully use a tree with no branch lengths mentioned in the format.No ancestor node names. Fixed with this block: if(!$node->is_Leaf){ $node->id(""); }
perl -MBio::TreeIO -e ' $tree=Bio::TreeIO->new(-file=>"nextstrain_ncov_global_tree.resolved.nwk")->next_tree; $tree->force_binary; $tree->contract_linear_paths; for my $node($tree->get_nodes){ $node->branch_length || $node->branch_length(rand(1e-7)); if(!$node->is_Leaf){ $node->id(""); } } print $tree->as_text("newick")."\n"; ' > anonymized.nwk
Similar to #9 but I can't solve it with regex. I downloaded the nextstrain tree (Jan 4, 2021) for nCov and wanted to run TreeToReads.py with it (newick attached below). However the seq-gen part gives the closing bracket error. I have tried a variety of things including renaming the taxa and resolving multifurcations
And breaking apart long lines
This is my seq-gen command (and change the stdin parameter accordingly)
But nothing seems to help so far. Any ideas?
nextstrain_ncov_global_tree.zip