josef-pkt commented 7 years ago

continuing discussion and investigation in #3578 and some parts of llnull PR #3921 basic implementation for one component count models in #3928

currently NegativeBinomial, GeneralizedPoisson and NegativeBinomial use Poisson as preliminary estimator to get better start_params.

ZeroInflated models use a separate model method _get_start_params for a preliminary estimator ZIP uses Poisson (but should use TruncatedPoisson when available), ZIGP uses ZIP ZINBP also uses the main model NBP

The extra parameters compared to the simpler model are set to constant, e.g. 0.1 or zeros.

suggestions:

add optimizer options for preliminary estimator, see 3578, preferably optim_kwds_prelim in kwargs
- [x] implemented for model in discrete_model #3928
- [ ] models in count_model. zero-inflated, and others ?
start with llnull: I don't know how well it works (I might have checked some negbin cases) 3921 added one step moment estimators for null params, the null MLE model is in the results class and not in the model class, so currently not available for model.fit
- [x] implemented for Poisson in #3928, not for any other models
use better preliminary start_params for extra parameters. The moment estimator in 3921 can be generalized to just use resid and mu and so could be used in any transitions e.g. from Poisson to GeneralizedPoisson or Negative Binomial. (I don't have an example or code for zero inflation start params yet, e.g. mixing probability could be roughly estimated from excess zeros, as in the zero-inflation/modification score or chisquare tests)
- [x] implemented for NBP and GPP in #3928, not for any other models
- [ ] zero-inflation needs also better start_params for inflation/mixing parameters

two objectives: get better defaults and provide more options to users.

3887 SUMM issue for count model follow-up

3924 boundary, numerical problems that also affect optimization

some test cases are in various bug issues, and in current unit tests that show warnings, or that explicitly work around convergence problems.

another issue: maxiter=35 is too small as a default, e.g. it is not changed if method=bfgs

josef-pkt commented 7 years ago

background

Poisson is QMLE for exp mean function, and is consistent for all other model in discrete_model. Beyond that we are mostly in full MLE setting.

Zero-Truncated Poisson is consistent estimator for main model in ZIP (conditional model for y>0).

josef-pkt commented 7 years ago

current inconsistency preliminary optimizer options:

NegativeBinomialP used method of final model ZeroInflated; ZIP and ZINB use method='nm', ZIGPP uses default (only disp included in start params fit)

josef-pkt commented 7 years ago

several stages - propagate prelim options:

currently the only more than 2-stage: ZIGP uses ZIP which uses Poisson should ZIGP when fitting ZIP use optim_kwds_prelim as fit arg to propagate to Poisson? or maybe even more: Do we want to use the optim_kwds_prelim only in the first estimator in multi stage and not in intermediate stages?

not much of an issue at the moment because ZIGP should switch to truncated GPP as soon as available, I guess.

josef-pkt commented 7 years ago

get or pop kwargs?

in my current WIP version, I use get optim_kwds_prelim.update(kwargs.get('optim_kwds_prelim', {})) which means the keyword is not removed from the kwargs and transmitted in the kwargs in the final model super(..).fit call. Unknown kwargs are still ignored in fit, AFAICS, and the optim_kwds_prelim are attached to mle_settings in super. If we pop and not transmit the keyword, then we need to explicitly add or attach the optim_kwds_prelim info somewhere, so user can check and we can directly assert it in unit tests.

In [29]:
res_nbp.mle_settings

Out[29]:
{'callback': <function statsmodels.discrete.discrete_model.NegativeBinomialP.fit.<locals>.<lambda>>,
 'disp': 1,
 'epsilon': 1.4901161193847656e-08,
 'extra_fit_funcs': {},
 'fargs': (),
 'full_output': 1,
 'gtol': 1e-05,
 'maxiter': 35,
 'norm': inf,
 'optim_kwds_prelim': {'disp': 1, 'method': 'bfgs'},
 'optimizer': 'bfgs',
 'retall': False,
 'start_params': array([ 6.96829054,  0.44731157,  0.1       ])}

I'm leaving get for now, but we might have to switch to pop.

josef-pkt commented 7 years ago

aside: I changed dataset from #3533 to #3337 (3337 includes exposure variable)

another idea:

Given that Poisson is at the core of several of the estimators, improving convergence robustness by default, would help also for many other models. So it would be worth to "try_harder" or "fit_difficult" by default, even if it makes it more complicated

In the following the start_params for NBP come from a failing Poisson prelim estimate. Because of the awful start_params coming from Poisson, NBP convergence also fails. There should be enough indications, information provided, to decide to "try again" with non-default optimization options. Poisson still uses newton as default.

model_nbp = sm.NegativeBinomialP(y, X_sm, p=2)
res_nbp = model_nbp.fit(maxiter=100) # method='bfgs' # os default
.... <many warnings>

In [75]:
res_nbp.mle_settings
Out[75]:
{'callback': <function statsmodels.discrete.discrete_model.NegativeBinomialP.fit.<locals>.<lambda>>,
 'disp': 1,
 'epsilon': 1.4901161193847656e-08,
 'extra_fit_funcs': {},
 'fargs': (),
 'full_output': 1,
 'gtol': 1e-05,
 'maxiter': 100,
 'norm': inf,
 'optimizer': 'bfgs',
 'retall': False,
 'start_params': array([  3.23163642e+02,   9.22826094e+01,   1.00000000e-01])}

In [73]:
res_nbp.mle_retvals
Out[73]:
{'Hinv': array([[  8.59848603,   0.14150376,  13.17276359],
        [  0.14150376,   5.52879134,   0.97399881],
        [ 13.17276359,   0.97399881,  23.49239009]]),
 'converged': False,
 'fcalls': 260,
 'fopt': nan,
 'gcalls': 260,
 'gopt': array([        nan,         nan,  0.32909848]),
 'warnflag': 2}

josef-pkt commented 7 years ago

and one more: make default optimizer choice method depend on scipy version?

bfgs works well for scipy >= 0.18, but unit tests fail for older. With older scipy versions we would need 'nm' as prelim even if slower. recent scipy have dogleg and similar which could replace our newton

Even if we don't change the models, we might have to condition on scipy version in unit tests.

josef-pkt commented 7 years ago

a bit strange, but maybe not

If I include offset in the #3337 example, then NegativeBinomialP converges without problem, even though if fails without offset. The reason is that Poisson prelim finds good start_params for NBP.

So using offset is not useful as the test case to check the improvements here. Something else must have changed already to handle the less misspecified case with offset. (I'm using currently scipy 0.18.1)

offset=np.log(df['population'].values + 1)

In [38]:
model_nbp = sm.NegativeBinomialP(y, X_sm, p=2, offset=offset)
res_nbp = model_nbp.fit(maxiter=100) # method='bfgs' # is default
Optimization terminated successfully.
         Current function value: 6.748171
         Iterations: 13
         Function evaluations: 15
         Gradient evaluations: 15

I guess I need to switch back to the other dataset which has worse convergence behavior.

josef-pkt commented 7 years ago

more on misspecified models: using dataset sm3533.csv that will be included for unit tests

model = GeneralizedPoisson(y, X_sm, offset=offset, p=2) does not converge at all, or better it says it converged but predict().mean() is much too large (huge) and several overflow warnings. Trying different explicit start_params doesn't help either

in contrast to this, p=1.5 or p=1 converge with defaults (a bit larger maxiter for p=1, needs 70 iterations) with predict().mean() close to y.mean()
prelim Poisson provides good start_params However this is because GPP uses method=method in call to prelim, Poisson fit ( I forgot to change this)

josef-pkt commented 3 years ago

ZINBP hessian inversion problem in model with very small zero inflation, "newton" works, "bfgs" does not

context writing unit test for #7870 docvis data, with ZINBP default p=2, bfgsb converges but issues hessian inversion warning and bse are nan. The probability for the main model (1 - prob of inflation) which = "prob-main" is 1 form many observations (at display precision).

Using "newton" optimizer it converges, inflation params are less negative and pro-main stays away from 1 (values like 0.99998639, 0.99998241 ...). Furthermore newton converges with default params, but it's a bit slow. Results params for main model are close to Stata, inflation params differ in second or third decimal ( -6.5842) but that's all close to zero. aside: AFAICS, Stata has only NB2 in the zi version.

When I estimate, ZINBP(p=1), then there are a few more excess zeros, and there are no convergence problems.

ZINB is not the best choice of model for the data, but I need unit tests.

josef-pkt commented 1 year ago

bump for 0.15: use hurdle or truncated models as start_params for zero-inflated models

statsmodels / statsmodels

SUMM: improve start_params and preliminary estimators #3925

3887 SUMM issue for count model follow-up

3924 boundary, numerical problems that also affect optimization