Closed pekarj closed 2 years ago
Hey! I tried opening it, and I think it seems to load properly? The sequences of the internal nodes seem to be a part of the edge_params
attribute. For example, the following code snippet:
from treeswift import read_tree_nexus
nex = read_tree_nexus("test/ancestralStateReconstruction.nexus")
print(nex['STATE_1600'].root.edge_params)
prints the following:
&oct21update_167vir_wgal_2_reg01.subsampled="ATTTAGTTATTACTTATACTGCGTGCGTGCACGAAGCATGCAGGCGAGTGTTAGCCACACTGATTTTAAAGTTCGTTTAGAGAACAGATCAACAAGAGATCGAGAGTTGATATTAGGTTTTTACAAATACCTTCCCAGGTAATAAAACCAACCAACTCTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCGCTTGGCTGTATGCCTAGTGCACCCACCCAGTATAATTAATAATAAACTTTACTGTCGTTGACAGGAAACGAGTAACTCGTCTATCTTCTGCAGACTGCTTACGGTTTCGTCCGTGTCGCAGTCGATCATCAGCATATCTAGGTTTCGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTGCCTGGTATCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTTCTTCAGGTTCGAGACGTGCTAGTACGTGGCTTTGGAGACTCTGTGGAAGAGGCCCTATCGGAGGCACGTCAACATCTTCAAGATGGCACTTGTGGTTTAGTAGAGCTTGAAAAAGGTGTATTGCCCCAACTTGAACAGCCCTATGTGTTCATTAAACGTTCTGATGCTCTAAACGCACATCACGGCCATGTAATGGTTGAATTGGTAGCAGAACTAGATGGCATTCAGTACGGTCGTAGTGGTGAGACTCTTGGAGTTCTTGTACCCCATGTGGGTGAAAACCCTATTGCGTACCGCAAAGTTCTTCTTCGTAAGAATGGTAATAAGGGAGCTGGTGGCCATAGCTATGGCACCGATCTAAAGTCTTATGACTTAGGTGACGAGCTTGGCACTGATCCCATTGAAGATTTTCAAGAAAACTGGAACACTAAACATGGCAGTGGTGCTGGTCGTGAGCTCATCCGTGAGCTTAATGGAGGCGTAGACACTCGCTATGTCGATAACAACTTCTGTGGCCCTGATGGCTACCCTCTTGAGTGCATTAAAGACCTTCTAGCTCGTGCTGGTAAGGCTTT",region="Yunnan"
So given a node
, I can extract the sequence as follows:
node.seq = node.edge_params.split('=')[1].split(',')[0].strip('"')
So for a given tree
, to label all nodes accordingly:
for node in nex['STATE_1600'].traverse_preorder():
node.seq = node.edge_params.split('=')[1].split(',')[0].strip('"')
Can you let me know if that works for you?
Ah, I was mistaken on the reading in part -- I misinterpreted the output. However, I only seem to get the sequence for the root node. If I do what you wrote above for STATE_1600
, and then print each seq:
for node in nex['STATE_1600'].traverse_preorder():
print(node.seq)
I get the following:
ATTTAGTTATTACTTATACTGCGTGCGTGCACGAAGCATGCAGGCGAGTGTTAGCCACACTGATTTTAAAGTTCGTTTAGAGAACAGATCAACAAGAGATCGAGAGTTGATATTAGGTTTTTACAAATACCTTCCCAGGTAATAAAACCAACCAACTCTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCGCTTGGCTGTATGCCTAGTGCACCCACCCAGTATAATTAATAATAAACTTTACTGTCGTTGACAGGAAACGAGTAACTCGTCTATCTTCTGCAGACTGCTTACGGTTTCGTCCGTGTCGCAGTCGATCATCAGCATATCTAGGTTTCGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTGCCTGGTATCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTTCTTCAGGTTCGAGACGTGCTAGTACGTGGCTTTGGAGACTCTGTGGAAGAGGCCCTATCGGAGGCACGTCAACATCTTCAAGATGGCACTTGTGGTTTAGTAGAGCTTGAAAAAGGTGTATTGCCCCAACTTGAACAGCCCTATGTGTTCATTAAACGTTCTGATGCTCTAAACGCACATCACGGCCATGTAATGGTTGAATTGGTAGCAGAACTAGATGGCATTCAGTACGGTCGTAGTGGTGAGACTCTTGGAGTTCTTGTACCCCATGTGGGTGAAAACCCTATTGCGTACCGCAAAGTTCTTCTTCGTAAGAATGGTAATAAGGGAGCTGGTGGCCATAGCTATGGCACCGATCTAAAGTCTTATGACTTAGGTGACGAGCTTGGCACTGATCCCATTGAAGATTTTCAAGAAAACTGGAACACTAAACATGGCAGTGGTGCTGGTCGTGAGCTCATCCGTGAGCTTAATGGAGGCGTAGACACTCGCTATGTCGATAACAACTTCTGTGGCCCTGATGGCTACCCTCTTGAGTGCATTAAAGACCTTCTAGCTCGTGCTGGTAAGGCTTT
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
0.002531541819940809
Also, if I then try to save a given tree to a newick file, I get the following:
[&R] (EPI_ISL_852604:1.6357507323841585[&default.rate=0.0014477,region.rate=1.0],(((((('MG772933.1':1.1728750371368921[&default.rate=0.0014477,region.rate=1.0],(('MZ937000.1':0.668197515923221[&default.rate=0.0014477,region.rate=1.0],('OK017803.1':0.11244129651179363[&default.rate=0.0014477,region.rate=1.0],'OK017805.1':0.18915362527893748[&default.rate=0.0014477,region.rate=1.0]):1.132491004949265[&default.rate=0.0014477,region.rate=1.0]):1.3790022185450421[&default.rate=0.0014477,region.rate=1.0],'MT121216.1':0.7773591775403883[&default.rate=0.0014477,region.rate=1.0]):2.5489405171307915[&default.rate=0.0014477,region.rate=1.0]):0.8084992611541111[&default.rate=0.0014477,region.rate=1.0],'MT040334.1':1.8964427914417872[&default.rate=0.0014477,region.rate=1.0]):0.021448171535575433[&default.rate=0.0014477,region.rate=1.0],((('MW251308.1':0.752610331468559[&default.rate=0.0014477,region.rate=1.0],'MZ081381.1':0.6843043205397228[&default.rate=0.0014477,region.rate=1.0]):0.8620467540602199[&default.rate=0.0014477,region.rate=1.0],'MN908947.3':1.1337378550475021[&default.rate=0.0014477,region.rate=1.0]):0.30419404079068846[&default.rate=0.0014477,region.rate=1.0],('OK017806.1':1.565754161119145[&default.rate=0.0014477,region.rate=1.0],'MZ937003.1':0.8410367421505784[&default.rate=0.0014477,region.rate=1.0]):1.1351914333492206[&default.rate=0.0014477,region.rate=1.0]):3.4635207109748003[&default.rate=0.0014477,region.rate=1.0]):0.9690864609635517[&default.rate=0.0014477,region.rate=1.0],'MG772934.1':1.3828678348998782[&default.rate=0.0014477,region.rate=1.0]):1.7353311055214977[&default.rate=0.0014477,region.rate=1.0],('MZ937002.1':3.718732522061596[&default.rate=0.0014477,region.rate=1.0],(('MZ937001.1':0.852162077262772[&default.rate=0.0014477,region.rate=1.0],EPI_ISL_406801:0.3494298368256459[&default.rate=0.0014477,region.rate=1.0]):2.058149883507049[&default.rate=0.0014477,region.rate=1.0],(EPI_ISL_412977:1.0899278494417612[&default.rate=0.0014477,region.rate=1.0],'MW703458.1':0.191297712455349[&default.rate=0.0014477,region.rate=1.0]):0.7861749639367939[&default.rate=0.0014477,region.rate=1.0]):0.8084205612917748[&default.rate=0.0014477,region.rate=1.0]):4.417237209586674[&default.rate=0.0014477,region.rate=1.0]):2.471512232031648[&default.rate=0.0014477,region.rate=1.0],('KF294457.1':0.18554927299601687[&default.rate=0.0014477,region.rate=1.0],'MN996532.2':1.7444533825851103[&default.rate=0.0014477,region.rate=1.0]):1.908271488498043[&default.rate=0.0014477,region.rate=1.0]):0.6131628476025277[&default.rate=0.0014477,region.rate=1.0])[&oct21update_167vir_wgal_2_reg01.subsampled="AATTAGTTATTAATTATACTGCGTGATTGCACTAAGCATGCAGCCGAGTGACAGCCACACAGATTTTAAAGTTCGTTTAGAGAACAGATCTACAAGAGATCGAGAGTTGATATTAGGTTTTTACAAATACCTTCCCAGGTAACAAAACCAACCAACTCTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTTGGCTGTATGCTTAGTGCACTCACGCAGTATAATTAATAATTAACTTTACTGTCGTTGACAGGACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTTTCGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTCTTACAGGTTCGCGACGTGCTCGTACGTGGCTTTGGAGACTCCGTGGAGGAGGCTTTATCAGAGGCACGTCAACATCTTAAAGATGGCACTTGTGGCTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACAGCCCTATGTGTTCATTAAACGTTCTGATGCTCGAACTGCACCTCATGGCCATGTTATGGTTGAACTGGTAGCAGAACTCGATGGCATTCAGTACGGTCGTAGTGGTGAGACACTTGGTGTTCTTGTTCCTCATGTGGGCGAAACACCTGTGGCTTACCGCAAAGTTCTTCTTCGTAAGAACGGTAATAAAGGAGCTGGTGGCCATAGTTACGGCGCCGATCTAAAGTCTTTTGACTTAGGTGACGAGCTTGGCACTGATCCTTATGAAGATTTTCAAGAAAATTGGAACACTAAACATGGCAGTGGTGTTACCCGTGAACTCATGCGTGAGCTTAACGGAGGAGCATACACTCGCTATGTCGATAACAACTTCTGTGGCCCTGATGGCTATCCTCTTGAGTGCATTAAAGATCTTCTAGCCCGTGCTGGTAAGGCTTC",region="Yunnan"];%
It looks like only the root sequence was correctly parsed and then saved?
Ah, woops, I only looked at the root and assumed all other (internal) nodes would have the same structure. I'll take a look later to see if I can figure out what's going on!
Sounds good! This might be helpful (with a regex at the bottom of the linked page):
Finally had a bit of time to look at this, and I see what the problem is! The problem is when a node has both edge params and node params as per the BEAST NEXUS metacomments (thanks for the link!). Basically, the node params get overwritten by the edge params. Should be a simple fix
Okay, added to TreeSwift v1.1.27! Here's an example script:
from treeswift import read_tree_nexus
for node in read_tree_nexus('ancestralStateReconstruction.nexus')['STATE_1600'].traverse_preorder():
print('LABEL: %s' % node.label)
if hasattr(node, 'node_params'):
print('NODE PARAMS: %s' % node.node_params)
if hasattr(node, 'edge_params'):
print('EDGE PARAMS: %s' % node.edge_params)
print()
I also updated the Newick output functionality, which should hopefully properly save the tree with the node and edge params as well. Can you test things out and see if they look okay on your end? In the tree you provided, sequences are now in the node_params
attribute of Node
objects
Woops, actually, there's a bug when outputting the Tree
object as a Newick string. I'll fix right now
Okay, I just pushed TreeSwift v1.1.28 that fixes the Newick output functionality as well! Test things out on your end, and once you give me the green light that it seems to be working for you, let me know and I'll close this GitHub Issue
Haven't heard anything horrendous (yet), so I'll assume it works and close this ticket. If something is still not working, feel free to reopen or create a new ticket 😄
Hey Niema,
This is a bit of a followup on the last few updates for TreeSwift. I did ancestral state reconstruction using BEAST, creating a nexus file with trees that include a sequence at every node. It doesn't seem like TreeSwift is able to read in the resulting file though. Any chance TreeSwift can be updated to read such a file in (and also include the ASR as a node attribute)?
ancestralStateReconstruction.nexus.zip