palaeoware / trevosim

TREvoSim - The [Tr]ee [Evo]lutionary [Sim]ulator program
GNU General Public License v3.0
5 stars 3 forks source link

Unrooted tree in output file #37

Closed ms609 closed 5 months ago

ms609 commented 6 months ago

I was surprised to see that trees in the "Output NEX tree file" are flagged as unrooted ([&U]); I don't understand why the position of the root is unknown in the simulated tree.

RussellGarwood commented 6 months ago

This was originally because of our choice of distance metrics for some initial studies with TREvoSim, I recall.

However, the other reason is that the trees herein don't always quite so easily fit our typical thinking for emprical trees, and doing it this way ameliorated that issue a little (I have just remember thinking about this when I started writing the software, but had forgotten since). I would welcome your thoughts - if we consider the root to be the last common ancestor of everything in the tree (a relatively common definition, often applied to a species), then this will be - in our case - an early individual within species zero, but not species zero itself. Due to the nature of this algorithm, under many settings, species zero will kick around for a long time, deiversify, and give birth to other species, and so if we consider the root to be the oldest point in the tree, this will be the node splitting species zero (and typically its associated clade) from species one. Species zero may nest uptree somewhere - so as a apecies it has the oldest origin, but typically an extinction later than a number of other taxa in the tree, and it is a direct ancestor of a number of these (the genome is written at extinction by default, and so its characters best represent its later state).

Rooting on the earliest node, rather than species zero, is currently how the tree printing algorith renders a tree - i.e. if I write the tree with [&R], then we will root between species 1 and (assuming this species does not evolve into its own clade, which is often the case), the other taxa. Is that appropriate? I think it probably is, but given the lack of granularity in our typical empirical datasets, we don't tend to think of the the individual that is the root as distinct in any way from the species to which it belonged. By unrooting, for those approaches that require a root, this forces the user to address that distinction and think about how to appropraitely root the tree. I would very much welcome your thoughts on whether changing to rooting outputs, in this light, would be sensible.

RussellGarwood commented 6 months ago

On my walk home (somehow as soon as I stop thinking about these things, my brain seems to figure stuff out in a different way) it struck me that this distinction (between the individual and the species) really matters most where we care about the characters, rathre than the tree topology per se. So I suggest the following might be useful: I put some words to the effect of the above comment in the documentation somewhere, root the tree on the oldest node, and as part of the matrix output, I actually write the genome the simulation was seeded with as "root" or similar. Then if people need to care about this they will have what we know to be the plesiomorphic state for every character.

ms609 commented 6 months ago

It took a long run for me to mull this over – it's so easy to interpret a cladogram as an evolutionary tree, and I don't get much practice thinking about the distinction.

What you say makes sense, and I think this discussion would be a valuable addition to the documentation. The root will not necessarily be close to the first species to evolve – but TREvoSim still knows where it is. The tricky thing about writing it to the matrix output is that the matrix will then contain a row that doesn't correspond to a leaf of the tree; perhaps a separate ||Root|| parameter could contain just that character sequence, so a user could write ||Matrix||\n||Root||; if they needed that information – without creating extra work for users who don't need it, and would have to remove it from a matrix?

RussellGarwood commented 6 months ago

I'm glad it's not just me that finds thinking about this a bit of a challenge at times. Thanks for the thoughts, I think the ||Root|| suggestion is an excellent one, and will implement this. I'll update this issue when that is in place.

RussellGarwood commented 6 months ago

Thanks again for the discussion Martin. As an outcome of this issue:

I think that is everything for this issue!