ENH: helper function for random numbers from multinomial, right truncated count regression

statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python

http://www.statsmodels.org/devel/

BSD 3-Clause "New" or "Revised" License

10.14k stars 2.88k forks source link

ENH: helper function for random numbers from multinomial, right truncated count regression #7162

Open josef-pkt opened 3 years ago

josef-pkt commented 3 years ago

just an internal helper function. I coded it a few times but only in experimental code

Should support simulating a regression model for count data, where we don't have a good rvs, e.g. generalized poisson, or truncated, hurdle versions.

It should be vectorized for nobs arrays with different parameters by nobs, i.e. different multinomial probabilities.

we can either draw a single cell number per obs, or cumulative counts.

josef-pkt commented 3 years ago

aside: drawing random numbers for generalized Poisson

Hakan Demirtas (2014): On accurate and precise generation of generalized Poisson variates, Communications in Statistics - Simulation and Computation, DOI: 10.1080/03610918.2014.968725

josef-pkt commented 3 years ago

A possibility to add tail is to use an approximating tail distribution similar to pareto tails for continuous variables.

josef-pkt commented 2 years ago

where is it? I thought I had written the function a while ago.

Would be nice also to simulate data based on hurdle models, at least for examples and unit tests.

josef-pkt commented 2 years ago

what I'm using right now for hurdle model

constant only model, replicating often enough to verify that we get correct empirical frequencies

rng = np.random.default_rng()

probs = res_hnb.predict(res_hnb.model.exog[:10], which="prob", y_values=np.arange(30))
probs.shape, probs.sum(1)
((10, 30), array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]))
cdf = probs.cumsum(1)
n = cdf.shape[0]
cdf = np.column_stack((cdf, np.ones(n)))

n_repl = 10000
rvs = []
for i in range(n_repl):
    u = rng.random((n, 1))
    rvs.append(np.argmin(cdf < u, axis=1))

rvs = np.concatenate(rvs)
rvs.shape
(100000,)