palaeoware / trevosim

TREvoSim - The [Tr]ee [Evo]lutionary [Sim]ulator program
GNU General Public License v3.0
4 stars 3 forks source link

New features: Default simulation parameters #6

Closed ms609 closed 2 months ago

ms609 commented 4 months ago
RussellGarwood commented 2 months ago

Thanks for all these points @m609 - I've spent a train journey playing with the software and script to improve this. Quick question for you, given your depth of knowledge is greater than mine (and my train has limited internet, procluding precuring papers easily) - all TREvoSim trees are full resolved. Given this, does total cophenetic index still provide advantages over Colless?

RussellGarwood commented 2 months ago

Thanks for fixing the above TreeTools issue so quickly! I have now completed a bunch of changes to address these issues - I accept that benchmarking was perhaps not a great term (comparison to a standard != comparison to data where we don't actually have the true tree.

To clarify the situation I have done the following:

I think this addresses most of your points - a few of your questions/comments related to my poor explanation, so to that end, I note:

-- : I would expect the first order control on number of steps to be the number of leaves / characters in a dataset. And is it legitimate to compare the reconstructed trees

These are per character averages, not pre tree, which I think addresses this.

-- (I guess these are inferred under a likelihood model, because the nexus files contain non-integer edge lengths – are they ML trees? MCC?) under these datasets with the “true” simulated tree in TREvoSim?

I have now clarified this in the text, but these are against total evidence topologies. Not ideal, but also they are what we have.

I hope this clears up all the above point! The only thing I haven't done is change the violin plot, and that is because I now describe what it shows and highlight the simulated data in the text

ms609 commented 2 months ago

Looks good, thanks, Russell, I've left some comments on the comparison script at #53.

I would expect the first order control on number of steps to be the number of leaves / characters in a dataset

OK, I missed that this was per character. The number of leaves is still, I think, a potential factor here; trivially, at most one step can be observed on a two-leaf tree; and if the number of leaves in a tree is related to its length, then more leaves → more time → more opportunities for extra steps.

RussellGarwood commented 2 months ago

I have modified the graphing, and actioned all aspects of PR #53. Thanks!

The only thing remaining is extra steps and leaf counts. I agree that this is likely to be a factor, but so many other things are - taxon and coding choice, the history of the group in question (I suspect taxon count and is as much to do with genomic availability for TE evidence as it is the age of the group), the question the original study was asking - that merely normalising by leaf count will do more harm than good, and I can, right now, think of an obvious solution (indeed, mapping time to iterations in TREvoSIm also involves a range of assumptions). I'll think on it, but at the moment will stick with the current formulation.