lonelyjoeparker / befi-bats-gui

Java software to analyse phylogeny-trait correlations in discrete traits, accounting for phylogenetic uncertainty via posterior sampling.
http://science.lonelyjoeparker.com
GNU General Public License v3.0
9 stars 3 forks source link

Interpreting BaTS results #6

Open alexjlee21 opened 8 years ago

alexjlee21 commented 8 years ago

I have run BaTS on a protein tree with 52 taxa and I have a few questions looking at the results:

  1. Is there a reason why some values are decimals and some are integers? I don’t see a reason why the confidence intervals based on the observed distribution would be integers and the intervals based on the null distribution are decimals
  2. The “observed mean” value for the “MC(severe)” is outside the confidence interval. Do you know why this is happening? I found that if I limit the number of sample trees to use in BaTS this issue goes away.
  3. Is the reason why the p-values look “too perfect” (ex. 0.00999999) because of the way java handles rounding?
  4. Does BaTS tell you where on the tree there is significant clustering?

Statistic observed mean lower 95% CI upper 95% CI null mean lower 95% CI upper 95% CI significance

AI 2.736281872 2.004738569 3.422225475 3.005850554 2.248389959 3.728923798 0.278999984 PS 15.76347351 14 17 16.95476532 14.56786442 19.17065811 0.249000013 MC (severe) 3.052894115 3 3 2.678668022 2.054890156 3.624750614 0.328999996 MC (mild) 2.978043795 2 5 3.705248356 2.657684565 5.893213749 0.834999979

lonelyjoeparker commented 8 years ago

Hi Alex, thanks for raising this issue.

I've formatted your output as fixed-width text:

Statistic   observed mean   lower 95% CI    upper 95% CI    null mean       lower 95% CI    upper 95% CI    significance
AI          2.736281872     2.00473         3.4222      3.005850554         2.248389959     3.728923798     0.278999984
PS          15.76347351     14              17          16.95476532         14.56786442     19.17065811     0.249000013
MC (severe) 3.052894115     3               3           2.678668022         2.054890156     3.624750614     0.328999996
MC (mild)   2.978043795     2               5           3.705248356         2.657684565     5.893213749     0.834999979

In response to your questions:

  1. Decimal vs integer-valued statistics: All statistics (means) are stored as java.lang.Double variables (decimal), but observed values of MC (maximum clade-size) are integer-valued. These are converted to Double (decimal)-valued numbers when the mean observed MC size is calculated from the tree samples; the null (expected) upper/lower values are themselves means in any case.
  2. Observed mean outside 95% CIs: please see #5 (already answered).
  3. Perfect p-values / Java rounding: In a word, yes (again, see #5).
  4. Determining which nodes are the 'significantly clustered' ones: BaTS doesn't do this directly because the original test / development phylogenies were fairly small (<40 tips) and we assumed, implicitly I think, that users would be able to see from a consensus or MCC tree which the clustered nodes were. It would be possible to identify which were the 'clustered' nodes, and report them (initially probably by simply printing out every internal node's AI value perhaps) but might make the output (even more) confusing. What do you think?

hth, cheers joe

ajlee21 commented 8 years ago

Thank you for addressing all my questions.

For (4), I think initially adding annotations for internal nodes with the average AI score for the observed tree would be helpful

lonelyjoeparker commented 8 years ago

OK - I'll add this to the revisions list for the next update. We would be calculating AI-per-node on a summary tree (MCC or similar) since only the root is congruent across the whole set of PST trees.