veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
200 stars 68 forks source link

new hyphy aBSREL (v_2.5.51) version changes default --kill-zero-lengths yes #1663

Closed ariadnamorales closed 6 months ago

ariadnamorales commented 8 months ago

Hello, I noticed that the new hyphy aBSREL version (v_2.5.51 perhaps even one before) has a default "--kill-zero-lengths yes," but the default of previous versions was "--kill-zero-lengths No". While the Log(L) doesn't change, the AIC-c does because the correction is based on a different number of parameters included in the model. Therefore, running the same command line "hyphy absrel --alignment aln.fasta --tree tree.tre --output out.json" in different versions does not consider the same parameter number in the model. I have solved this issue by running "hyphy absrel --kill-zero-lengths No --alignment aln. fasta --tree tree.tre --output out.json" with the new version. But it would be great to change the default or add a warning informing users about the change. Thanks!

spond commented 8 months ago

Dear @ariadnamorales,

I can confirm this behavior. To expand:

  1. Most selection analyses in HyPhy use a nucleotide model as pass 1 in model fitting, used to obtain the first approximations to branch lengths. By default, all internal branch lengths which are estimated to be 0 are "deleted", i.e. internal nodes are collapsed to polytomies. This does not affect the likelihood function but speeds up calculations.
  2. This procedure can be viewed as setting some of the branch length parameters of the model to 0, which still counts as "estimation" for counting degrees of freedom.
  3. Currently, HyPhy does not keep track of these "deleted" branch lengths; this does not affect parameter estimation of likelihood ratio testing, but does affect AIC calculations which include the number of estimated parameters.

The behavior can be overridden by setting --kill-zero-lengths Constrain (or No) on the command line.

I'll implement a fix in one of the upcoming releases.

Best, Sergei

P.S An example (note the difference in the number of estimated model parameters reported).

hyphy absrel --alignment xxx

>kill-zero-lengths –> Yes

### Deleted 26 zero-length internal branches: `NODE10, NODE109, NODE115, NODE13, NODE133, NODE134, NODE184, NODE186, NODE189, NODE216, NODE217, NODE218, NODE219, NODE225, NODE228, NODE26, NODE262, NODE30, NODE37, NODE4, NODE47, NODE50, NODE61, NODE7, NODE9, NODE99`

...

### Fitting the baseline model with a single dN/dS class per branch, and no site-to-site variation. 
* Log(L) = -2473.43, AIC-c =  5302.62 (176 estimated parameters)
* Branch-level non-synonymous/synonymous rate ratio distribution has median  0.65, and 95% of the weight in  0.00 - 10000000000.00
hyphy absrel --alignment xxx --kill-zero-lengths No

### Obtaining branch lengths and nucleotide substitution biases under the nucleotide GTR model

>kill-zero-lengths –> No

....

### Fitting the baseline model with a single dN/dS class per branch, and no site-to-site variation. 
* Log(L) = -2473.43, AIC-c =  5409.18 (228 estimated parameters)
* Branch-level non-synonymous/synonymous rate ratio distribution has median  0.65, and 95% of the weight in  0.00 - 10000000000.00
github-actions[bot] commented 6 months ago

Stale issue message