Closed exalate-issue-sync[bot] closed 1 year ago
Wendy Wong commented: Our model selection backward mode was able to generate same coefficients and p-values of eliminated predictors as our competitor software. However the same cannot be said for binomial family. Hence, the job here is to check and make sure that for binomial family:
Instead of verifying 1 and 2, we just need to verify 2 since the coefficients have to be the same to generate the same p-values.
Wendy Wong commented: Added a description of what is going on with H2O GLM Gaussian and Binomial family implementations.
[^H2OGLMExplain (c6f1eb0b-d672-4d35-b099-1a354c46cbc3).pdf]
Wendy Wong commented: There are two goals here in the PR:
Regarding 1: I checked the covariance matrix calculation found that we have implemented the same calculation in our p-value calculation. To prove that I added a java test to manually calculate the covariance matrix, the standard error, the z-value and then the p-value. I then compared this calculation with the on in the GLM code and they match well. I even use a different method to perform the equivalent of finding the inverse of the covariance matrix.
To do for 1:
Regarding 2 I have completed the manual implementation on deriving the IRLSM coefficients using the formulae derived in ESL for both standardize = false and true. The h2o implementation matches well with the ones in the book.
I went back and look at the run process and discovered that line search was not enabled at any point. I gather the only reason that the two models are different between H2O and other tool must be due to the number of iterations run. If we can control the iterations, we may be able to get matched coefficients and p-values of eliminated predictors.
Wendy Wong commented: This is actually a great test to use! Wished I have come up with it.
JIRA Issue Details
Jira Issue: PUBDEV-8585 Assignee: Wendy Wong Reporter: Wendy Wong State: Resolved Fix Version: 3.36.1.1 Attachments: Available (Count: 1) Development PRs: Available
Attachments From Jira
Attachment Name: H2OGLMExplain (c6f1eb0b-d672-4d35-b099-1a354c46cbc3).pdf Attached By: Wendy Wong File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-8585/H2OGLMExplain (c6f1eb0b-d672-4d35-b099-1a354c46cbc3).pdf
Linked PRs from JIRA
Wendy Wong commented: [https://stats.stackexchange.com/questions/89484/how-to-compute-the-standard-errors-of-a-logistic-regressions-coefficients|https://stats.stackexchange.com/questions/89484/how-to-compute-the-standard-errors-of-a-logistic-regressions-coefficients|smart-link]