New features: Default simulation parameters

ms609 commented 4 months ago

[x] It’s not clear what “benchmarking” means in this context; the results presented don’t match the term as I am familiar with it.
It would be helpful to summarise in the article text what this benchmarking exercise entailed, and to briefly describe the results. Whilst graphics are provided in the benchmarking folder, the meaning and interpretation of the graphs is not obvious, and is not straightforward to infer from the associated R script – and indeed the R script doesn’t quite correspond to the materials provided (e.g. the file names differ). I felt like I could eventually just about guess what the images depicted, but I’m not sure that I can reach an intelligent conclusion about whether the differences between simulated and empirical data matter, or how they may influence analyses. I would suggest adding a file to the benchmarking folder that details the measures used (as advertised in §1.5 of the docs).
[~] The Defaults_homoplasy.pdf plot would be easier to interpret without a commentary if the “Simulated” column that corresponds to the TREvoSim output was set as a separate bar, distinct from the empirical data, and perhaps labelled something like “TREvoSim simulations”. I missed this column on first view. It’s also unclear what the expectation is here: I would expect the first order control on number of steps to be the number of leaves / characters in a dataset. And is it legitimate to compare the reconstructed trees (I guess these are inferred under a likelihood model, because the nexus files contain non-integer edge lengths – are they ML trees? MCC?) under these datasets with the “true” simulated tree in TREvoSim?
[x] Is there an argument for preferring Colless’s index to Mir et al’s (2013) total cophenetic index, which has greater resolving power? Mir A, Rosselló F, Rotger LA (2013). “A new balance index for phylogenetic trees.” Mathematical Biosciences, 241(1), 125–136. doi:10.1016/j.mbs.2012.10.005 Ecosystem engineering

RussellGarwood commented 2 months ago

Thanks for all these points @m609 - I've spent a train journey playing with the software and script to improve this. Quick question for you, given your depth of knowledge is greater than mine (and my train has limited internet, procluding precuring papers easily) - all TREvoSim trees are full resolved. Given this, does total cophenetic index still provide advantages over Colless?

RussellGarwood commented 2 months ago

Thanks for fixing the above TreeTools issue so quickly! I have now completed a bunch of changes to address these issues - I accept that benchmarking was perhaps not a great term (comparison to a standard != comparison to data where we don't actually have the true tree.

To clarify the situation I have done the following:

Renamed benchmarking "comparison to empirical" or words to that effect, which is more descriptive
Included 100 replicates of the default outputs into the associated folder
Ensured that the comparison script runs within the folder structure of the repo as long as you either 1) update the working directory in R to your absolute path or 2) call the R script from a bash script I included - it now writes outputs to their own folder here, so the entire thing should be reproducible changes to R packages etc. aside
Included a section in the documentation on the defaults, and the comparison to empirical, explaining what they show and highlighting the results (see section Algorithm and concepts)
Changed the tree asymmetry measure to total cophenetic index

I think this addresses most of your points - a few of your questions/comments related to my poor explanation, so to that end, I note:

-- : I would expect the first order control on number of steps to be the number of leaves / characters in a dataset. And is it legitimate to compare the reconstructed trees

These are per character averages, not pre tree, which I think addresses this.

-- (I guess these are inferred under a likelihood model, because the nexus files contain non-integer edge lengths – are they ML trees? MCC?) under these datasets with the “true” simulated tree in TREvoSim?

I have now clarified this in the text, but these are against total evidence topologies. Not ideal, but also they are what we have.

I hope this clears up all the above point! The only thing I haven't done is change the violin plot, and that is because I now describe what it shows and highlight the simulated data in the text

ms609 commented 2 months ago

Looks good, thanks, Russell, I've left some comments on the comparison script at #53.

I would expect the first order control on number of steps to be the number of leaves ~~/ characters~~ in a dataset

OK, I missed that this was per character. The number of leaves is still, I think, a potential factor here; trivially, at most one step can be observed on a two-leaf tree; and if the number of leaves in a tree is related to its length, then more leaves → more time → more opportunities for extra steps.

Violin plot: how about setting the fill colour for the simulated bar to white, or bolding its legend entry, or something else to make it stand out visually too?

RussellGarwood commented 2 months ago

I have modified the graphing, and actioned all aspects of PR #53. Thanks!

The only thing remaining is extra steps and leaf counts. I agree that this is likely to be a factor, but so many other things are - taxon and coding choice, the history of the group in question (I suspect taxon count and is as much to do with genomic availability for TE evidence as it is the age of the group), the question the original study was asking - that merely normalising by leaf count will do more harm than good, and I can, right now, think of an obvious solution (indeed, mapping time to iterations in TREvoSIm also involves a range of assumptions). I'll think on it, but at the moment will stick with the current formulation.

palaeoware / trevosim

New features: Default simulation parameters #6