stability of absrel results differs between versions

casparbein commented 1 year ago

Hi,

We recently switched from HyPhy version 2.5.8 to using version 2.5.51 for several absrel analyses. To be sure that results from earlier runs would be mostly unaffected by this switch, we reran absrel on one dataset twice, once using version 2.1 (old) and once using version 2.3 (new). The dataset contains several thousand genes and while most reported branch-gene combinations showed the same p-value in both versions, for branch-gene combinations with significant p-values, there were around 2,000 cases that only occured with the old absrel version.

As an example, we then looked at one gene where we got 9 branches with significant p-values in the old and none in the new version to figure out what was causing these discrepancies. For this, we ran absrel 10 times for each version, and compared uncorrected p-values: grafik

As you can see here, for all 10 runs with the new absrel version, p-values are consistent, whereas for the old version, in 5 of those ten runs, p-values are significant also after correcting for multiple testing, and in the other 5 runs, all p-values are non-significant after correcting for multiple testing. So there seems to be a consistency issue in the old version that either turns all branches significant, or none.

Comparing the parameters fitted by absrel, such as branch lenght and max. dN/dS, shows that they are very similar between both versions, but Log Likelihood and AICc are not. However, both are internally consistent in the old version, surprisingly still leading to these divergent results: grafik

Here, I just wanted to ask whether you are aware of this issue and its potential implications for analyses that were already done. Also, is the new version always stable?

Looking forward to your comments, Cheers, Bernhard

spond commented 1 year ago

Dear @casparbein,

Thanks for this detailed report! The short answer is that while in most cases, aBSREL should be stable, there are several sources of variability.

Core HyPhy and aBSREL updates. Between 2.5.8 and 2.5.51, there have been a very large number of changes to the codebase, including analysis tweaks, optimization engine updates, core algorithmic changes that modify numerical performance of the methods. This could definitely lead to a disagreement on some analyses.
Intrinsic complexity of mixture analyses; aBSREL fits gnarly models to the data, so local optima are always possible. Newer methods may be better able to escape them; it is encouraging that for every example, new version returns lower (better) c-AIC scores.
Stochasticity of the analysis itself; while aBSREL does not use random starting points, multi-processing environments can create run-to-run variability in numerical algorithms (see https://github.com/veg/hyphy/issues/1601)

Whenever in doubt, I would prefer the results which are associated with better c-AIC scores -- better model fit to the data.

Best, Sergei

casparbein commented 1 year ago

Thanks for your reply! We are still benchmarking aBSREL in different other scenarios and will definitely have more questions in the future (Let me know if you feel that discussing these things on 'Github' is somehow inconvenient).

spond commented 1 year ago

Dear @casparbein,

I prefer to have such discussions on GitHub. Leaves a record for posterity and helps other users.

Best, Sergei

veg / hyphy

stability of absrel results differs between versions #1643