macroevolution / bamm

A program for multimodel inference on speciation and trait evolution
GNU General Public License v2.0
33 stars 10 forks source link

BAMM crashing with ultrametric check #58

Closed drabosky closed 10 years ago

drabosky commented 10 years ago

BAMM's new ultrametric check has a problem. Won't run on the fish example dataset if I try to run a diversification analysis (the example dataset is in "traits"), claims dataset is not ultrametric (it is).

drabosky commented 10 years ago

It's a tolerance error, I think. But maybe worth having an explicit error message that indicates something to this effect. I might drop the tolerance to 10^-2

drabosky commented 10 years ago

I think there is still a bug in this even though I got it to work after increasing the tolerance. The tolerance in ape's is.ultrametric function is 1.490116e-08 which is much lower than BAMM at present.

drabosky commented 10 years ago

The point being that BAMM appears to have problems with trees that pass ultrametric check in ape, the fish tree is an example. Check the big fish tree that is used as an example in the "traits" section.

redcurry commented 10 years ago

Ok, I can take a look at that this evening.

On Monday, January 27, 2014, Dan Rabosky notifications@github.com wrote:

The point being that BAMM appears to have problems with trees that pass ultrametric check in ape, the fish tree is an example. Check the big fish tree that is used as an example in the "traits" section.

— Reply to this email directly or view it on GitHubhttps://github.com/macroevolution/bamm/issues/58#issuecomment-33357248 .

redcurry commented 10 years ago

I was able to look at this now and it is a floating point issue. When I print out all root-to-tip lengths, almost all are 345. But some are 344.999 and 345.001. I'll need to think how to solve this.

On Monday, January 27, 2014, Carlos Anderson carlosjanderson@gmail.com wrote:

Ok, I can take a look at that this evening.

On Monday, January 27, 2014, Dan Rabosky <notifications@github.com<javascript:_e({}, 'cvml', 'notifications@github.com');>> wrote:

The point being that BAMM appears to have problems with trees that pass ultrametric check in ape, the fish tree is an example. Check the big fish tree that is used as an example in the "traits" section.

— Reply to this email directly or view it on GitHubhttps://github.com/macroevolution/bamm/issues/58#issuecomment-33357248 .

drabosky commented 10 years ago

I wonder why ape's ultrametric check doesn't catch those. Also, I would expect the revised tolerance settings (10e-2) to be OK with those, but maybe not...

On Jan 27, 2014, at 9:20 AM, Carlos Anderson wrote:

I was able to look at this now and it is a floating point issue. When I print out all root-to-tip lengths, almost all are 345. But some are 344.999 and 345.001. I'll need to think how to solve this.

On Monday, January 27, 2014, Carlos Anderson carlosjanderson@gmail.com wrote:

Ok, I can take a look at that this evening.

On Monday, January 27, 2014, Dan Rabosky <notifications@github.com<javascript:_e({}, 'cvml', 'notifications@github.com');>> wrote:

The point being that BAMM appears to have problems with trees that pass ultrametric check in ape, the fish tree is an example. Check the big fish tree that is used as an example in the "traits" section.

— Reply to this email directly or view it on GitHubhttps://github.com/macroevolution/bamm/issues/58#issuecomment-33357248 .

— Reply to this email directly or view it on GitHub.

redcurry commented 10 years ago

I'm curious as to how R handles floating-point errors, but we're accumulating a lot by summing all those root-to-tip lengths.

redcurry commented 10 years ago

I looked more into this, and now I don't think it's a floating-point error. There isn't enough precision in the branch lengths, nor enough sums to have to worry about floating-point errors. However, I found something odd in the is.ultrametric() function in R. If you look at the last line of the function, what is actually being tested is whether the variance of the root-to-tip lengths is equal to 0. It is NOT testing whether all the root-to-tip lengths are equal to each other, as I had thought, and as is implemented in BAMM. This variance is equal to 6.637095e-09 (for the fish tree), and so it is well below the tolerance level. I think if we implemented our ultrametric test this way, we would obtain the same result. The question is whether this is the proper way to test for ultrametricity.

drabosky commented 10 years ago

Interesting. Maybe it's best to do it the ape way just for consistency, since many users will use ape. if the variance in root-to-tip lengths is zero within some tolerance, then I imagine that this is a reasonable estimate of ultrametricity.

On Jan 28, 2014, at 11:06 PM, Carlos Anderson wrote:

I looked more into this, and now I don't think it's a floating-point error. There isn't enough precision in the branch lengths, nor enough sums to have to worry about floating-point errors. However, I found something odd in the is.ultrametric() function in R. If you look at the last line of the function, what is actually being tested is whether the variance of the root-to-tip lengths is equal to 0. It is NOT testing whether all the root-to-tip lengths are equal to each other, as I had thought, and as is implemented in BAMM. This variance is equal to 6.637095e-09 (for the fish tree), and so it is well below the tolerance level. I think if we implemented our ultrametric test this way, we would obtain the same result. The question is whether this is the proper way to test for ultrametricity.

— Reply to this email directly or view it on GitHub.

redcurry commented 10 years ago

The only issue with looking at the variance is that, typically, larger numbers will have a larger variance, so the varience is somehow tied to the magnitude of the values themselves. Perhaps this isn't a big problem.

drabosky commented 10 years ago

I agree that this is potentially true for very large trees (the fish tree than just passed cleanly has ~8000 tips), but i'm not sure it's an issue in practice.

On Jan 28, 2014, at 11:21 PM, Carlos Anderson wrote:

The only issue with looking at the variance is that, typically, larger numbers will have a larger variance, so the varience is somehow tied to the magnitude of the values themselves. Perhaps this isn't a big problem.

— Reply to this email directly or view it on GitHub.

redcurry commented 10 years ago

Ok, changed the algorithm. Please test.