Open josef-pkt opened 7 years ago
background
Poisson is QMLE for exp mean function, and is consistent for all other model in discrete_model
.
Beyond that we are mostly in full MLE setting.
Zero-Truncated Poisson is consistent estimator for main model in ZIP (conditional model for y>0).
current inconsistency preliminary optimizer options:
NegativeBinomialP used method
of final model
ZeroInflated; ZIP and ZINB use method='nm'
, ZIGPP uses default (only disp included in start params fit)
several stages - propagate prelim options:
currently the only more than 2-stage: ZIGP uses ZIP which uses Poisson
should ZIGP when fitting ZIP use optim_kwds_prelim
as fit arg to propagate to Poisson? or maybe even more: Do we want to use the optim_kwds_prelim
only in the first estimator in multi stage and not in intermediate stages?
not much of an issue at the moment because ZIGP should switch to truncated GPP as soon as available, I guess.
get
or pop
kwargs?
in my current WIP version, I use get
optim_kwds_prelim.update(kwargs.get('optim_kwds_prelim', {}))
which means the keyword is not removed from the kwargs and transmitted in the kwargs in the final model super(..).fit
call. Unknown kwargs are still ignored in fit, AFAICS, and the optim_kwds_prelim
are attached to mle_settings
in super.
If we pop and not transmit the keyword, then we need to explicitly add or attach the optim_kwds_prelim
info somewhere, so user can check and we can directly assert it in unit tests.
In [29]:
res_nbp.mle_settings
Out[29]:
{'callback': <function statsmodels.discrete.discrete_model.NegativeBinomialP.fit.<locals>.<lambda>>,
'disp': 1,
'epsilon': 1.4901161193847656e-08,
'extra_fit_funcs': {},
'fargs': (),
'full_output': 1,
'gtol': 1e-05,
'maxiter': 35,
'norm': inf,
'optim_kwds_prelim': {'disp': 1, 'method': 'bfgs'},
'optimizer': 'bfgs',
'retall': False,
'start_params': array([ 6.96829054, 0.44731157, 0.1 ])}
I'm leaving get for now, but we might have to switch to pop.
aside: I changed dataset from #3533 to #3337 (3337 includes exposure variable)
another idea:
Given that Poisson is at the core of several of the estimators, improving convergence robustness by default, would help also for many other models. So it would be worth to "try_harder" or "fit_difficult" by default, even if it makes it more complicated
In the following the start_params for NBP come from a failing Poisson prelim estimate. Because of the awful start_params coming from Poisson, NBP convergence also fails. There should be enough indications, information provided, to decide to "try again" with non-default optimization options. Poisson still uses newton as default.
model_nbp = sm.NegativeBinomialP(y, X_sm, p=2)
res_nbp = model_nbp.fit(maxiter=100) # method='bfgs' # os default
.... <many warnings>
In [75]:
res_nbp.mle_settings
Out[75]:
{'callback': <function statsmodels.discrete.discrete_model.NegativeBinomialP.fit.<locals>.<lambda>>,
'disp': 1,
'epsilon': 1.4901161193847656e-08,
'extra_fit_funcs': {},
'fargs': (),
'full_output': 1,
'gtol': 1e-05,
'maxiter': 100,
'norm': inf,
'optimizer': 'bfgs',
'retall': False,
'start_params': array([ 3.23163642e+02, 9.22826094e+01, 1.00000000e-01])}
In [73]:
res_nbp.mle_retvals
Out[73]:
{'Hinv': array([[ 8.59848603, 0.14150376, 13.17276359],
[ 0.14150376, 5.52879134, 0.97399881],
[ 13.17276359, 0.97399881, 23.49239009]]),
'converged': False,
'fcalls': 260,
'fopt': nan,
'gcalls': 260,
'gopt': array([ nan, nan, 0.32909848]),
'warnflag': 2}
and one more: make default optimizer choice method
depend on scipy version?
bfgs works well for scipy >= 0.18, but unit tests fail for older. With older scipy versions we would need 'nm' as prelim even if slower.
recent scipy have dogleg and similar which could replace our newton
Even if we don't change the models, we might have to condition on scipy version in unit tests.
a bit strange, but maybe not
If I include offset in the #3337 example, then NegativeBinomialP converges without problem, even though if fails without offset. The reason is that Poisson prelim finds good start_params for NBP.
So using offset is not useful as the test case to check the improvements here. Something else must have changed already to handle the less misspecified case with offset. (I'm using currently scipy 0.18.1)
offset=np.log(df['population'].values + 1)
In [38]:
model_nbp = sm.NegativeBinomialP(y, X_sm, p=2, offset=offset)
res_nbp = model_nbp.fit(maxiter=100) # method='bfgs' # is default
Optimization terminated successfully.
Current function value: 6.748171
Iterations: 13
Function evaluations: 15
Gradient evaluations: 15
I guess I need to switch back to the other dataset which has worse convergence behavior.
more on misspecified models: using dataset sm3533.csv that will be included for unit tests
model = GeneralizedPoisson(y, X_sm, offset=offset, p=2)
does not converge at all, or better it says it converged but predict().mean() is much too large (huge) and several overflow warnings. Trying different explicit start_params doesn't help either
in contrast to this, p=1.5 or p=1 converge with defaults (a bit larger maxiter for p=1, needs 70 iterations) with predict().mean() close to y.mean()
prelim Poisson provides good start_params However this is because GPP uses method=method in call to prelim, Poisson fit ( I forgot to change this)
ZINBP hessian inversion problem in model with very small zero inflation, "newton" works, "bfgs" does not
context writing unit test for #7870
docvis data, with ZINBP default p=2, bfgsb converges but issues hessian inversion warning and bse are nan.
The probability for the main model (1 - prob of inflation) which = "prob-main"
is 1 form many observations (at display precision).
Using "newton" optimizer it converges, inflation params are less negative and pro-main stays away from 1 (values like 0.99998639, 0.99998241 ...). Furthermore newton converges with default params, but it's a bit slow. Results params for main model are close to Stata, inflation params differ in second or third decimal ( -6.5842) but that's all close to zero. aside: AFAICS, Stata has only NB2 in the zi version.
When I estimate, ZINBP(p=1), then there are a few more excess zeros, and there are no convergence problems.
ZINB is not the best choice of model for the data, but I need unit tests.
bump for 0.15: use hurdle or truncated models as start_params for zero-inflated models
continuing discussion and investigation in #3578 and some parts of llnull PR #3921 basic implementation for one component count models in #3928
currently NegativeBinomial, GeneralizedPoisson and NegativeBinomial use Poisson as preliminary estimator to get better start_params.
ZeroInflated models use a separate model method
_get_start_params
for a preliminary estimator ZIP uses Poisson (but should use TruncatedPoisson when available), ZIGP uses ZIP ZINBP also uses the main model NBPThe extra parameters compared to the simpler model are set to constant, e.g. 0.1 or zeros.
suggestions:
two objectives: get better defaults and provide more options to users.
see also
3887 SUMM issue for count model follow-up
3924 boundary, numerical problems that also affect optimization
some test cases are in various bug issues, and in current unit tests that show warnings, or that explicitly work around convergence problems.
another issue:
maxiter=35
is too small as a default, e.g. it is not changed ifmethod=bfgs