hmorlon / PANDA

Phylogenetic ANalyses of DiversificAtion
24 stars 15 forks source link

Function fit_t_pl: same codes, very different results on 32 or 64-bit R #29

Closed SArtuso closed 4 years ago

SArtuso commented 4 years ago

Hi, thank you vey much for this great package.

I have an issue regarding the function _fit_tpl which appears to give very different results if used on a 32-bit or 64-bit R platform, given the same codes and the same data. I will give you an example from my data (I am using 3D landmark data):

on 32-bit: ###read tree tree <- read.nexus("tree.nex") ###read data xx<-as.matrix(read.csv("dataset.csv",row.names=1))

xxBM <-fit_t_pl(xx, tree, model="BM", method="RidgeAlt", SE=TRUE)

xxBM

-- Summary results for the BM model --

Penalization: RidgeAlt

LOOCV (negative):        -14929.86 

Model parameter: 
______________________ 
0 

Regularization parameter (gamma): 
______________________ 
6.271081e-37 

Evolutionary Covariance of size: 30 by 30 
for 40 species 
______________________ 

on 64-bit:

###read tree tree <- read.nexus("tree.nex")

###read data xx<-as.matrix(read.csv("dataset.csv",row.names=1))

xxBM <-fit_t_pl(xx, tree, model="BM", method="RidgeAlt", SE=TRUE)

xxBM

-- Summary results for the BM model --

Penalization: RidgeAlt

LOOCV (negative):        -13593.94 

Model parameter: 
______________________ 
0 

Regularization parameter (gamma): 
______________________ 
1.087406e-33 

Evolutionary Covariance of size: 30 by 30 
for 40 species 
______________________ 

As you can see, the parameters of the BM model are all quite different. Also, when comparing the GIC of different models with the package mvMorph, I obtain this:

on the 32-bit

GIC(xxBM); GIC(xxEB); GIC(xxOU);

-- Generalized Information Criterion --

GIC: -30971.89 | Log-likelihood 15428.84 

-- Generalized Information Criterion --

GIC: -Inf | Log-likelihood Inf 

-- Generalized Information Criterion --

GIC: -28133.11 | Log-likelihood 14172.98 

on the 64-bit

GIC(xxBM); GIC(xxEB); GIC(xxOU);

-- Generalized Information Criterion --

GIC: -27390.41 | Log-likelihood 13783.3 

-- Generalized Information Criterion --

GIC: -28605.16 | Log-likelihood 14399.63 

-- Generalized Information Criterion --

GIC: -31424.25 | Log-likelihood 15676.97

The results are completely different. I have tested also with other datasets and the GIC are always different and sometimes, as in the case above, this affects also the choosing of the best fitting model. I would really appreciate your help on this issue. Do you have any idea of which one I should consider as the correct one?

My R version is the 3.6.2 on Windows 10, while the Rpanda version is the 1.6. Thank you very much for any help!

Regards, Silvia

JClavel commented 4 years ago

Hi Silvia,

It's hard to track the problem without a reproducible example. Small differences are expected between 32 and 64 bits versions since they’re not using the same math libraries and rounding. When I’m running the example code from ?GIC.fit_pl.rpanda on a Windows machine with both 32 and 64 bits versions (R-3.6.3) installed I can indeed find very small differences in parameter search. But the results are almost the same:

sessionInfo() R version 3.6.3 (2020-02-29) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362) // // with the example code from ?GIC.fit_pl.rpanda // GIC(fit1); GIC(fit2)

-- Generalized Information Criterion --

GIC: 7190.805 | Log-likelihood -3432.328

-- Generalized Information Criterion --

GIC: 7192.806 | Log-likelihood -3432.329

sessionInfo() R version 3.6.3 (2020-02-29) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 10 x64 (build 18362) GIC(fit1); GIC(fit2)

-- Generalized Information Criterion --

GIC: 7190.805 | Log-likelihood -3432.328

-- Generalized Information Criterion --

GIC: 7192.77 | Log-likelihood -3432.303

In your situation, it seems that there’s almost no need for regularization (the “gamma” parameter is very low). It’s possible that this led to numerical underflow affecting the program. If this is the cause, one solution might be to set the “tol” parameter to some small values (e.g. tol=1e-8; see ?fit_t_pl). You can also switch to ML or try another penalty. Sometime working on a scaled tree can also helps.

HTH,

Regards

Julien

SArtuso commented 4 years ago

Dear Julien,

thank you very much for your response! I followed your suggestion of changing the value for the regularization parameter, and it solves the problem of the inconsistency between the two versions of R, which now give the same results. However, it also strongly affects the estimation of the models parameters and the model’s comparison with the GIC. So, after trying out, I think I will stick with the 64-bit, whose outcomes looks more reliable then the 32-bit.

Thank you very much for your help!

Regards, Silvia

JClavel commented 4 years ago

Dear Silvia,

Thank you for your feedback. You can send me an email (in private) with example files (you can change the tip names if you want to keep it confidential) to check what’s happening if you want. This may help in tracking down the problem to add supplementary checks and warnings on the default’s parameters in the package. It’s hard to replicate some issues sometimes...

Best wishes,

Julien


De : SArtuso notifications@github.com Envoyé : jeudi 5 mars 2020 15:11 À : hmorlon/PANDA PANDA@noreply.github.com Cc : JClavel julien.clavel@hotmail.fr; Comment comment@noreply.github.com Objet : Re: [hmorlon/PANDA] Function fit_t_pl: same codes, very different results on 32 or 64-bit R (#29)

Dear Julien,

thank you very much for your response! I followed your suggestion of changing the value for the regularization parameter, and it solves the problem of the inconsistency between the two versions of R, which now give the same results. However, it also strongly affects the estimation of the models parameters and the model’s comparison with the GIC. So, after trying out, I think I will stick with the 64-bit, whose outcomes looks more reliable then the 32-bit.

Thank you very much for your help!

Regards, Silvia

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/hmorlon/PANDA/issues/29?email_source=notifications&email_token=ACSCJSLPPGU3QE7JIUXLHU3RF6XKPA5CNFSM4LBHXMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEN5MPSI#issuecomment-595249097, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACSCJSKD5ADQDL6CB4YHHR3RF6XKPANCNFSM4LBHXMMA.