Open josef-pkt opened 3 years ago
aside: drawing random numbers for generalized Poisson
Hakan Demirtas (2014): On accurate and precise generation of generalized Poisson variates, Communications in Statistics - Simulation and Computation, DOI: 10.1080/03610918.2014.968725
A possibility to add tail is to use an approximating tail distribution similar to pareto tails for continuous variables.
where is it? I thought I had written the function a while ago.
Would be nice also to simulate data based on hurdle models, at least for examples and unit tests.
what I'm using right now for hurdle model
constant only model, replicating often enough to verify that we get correct empirical frequencies
rng = np.random.default_rng()
probs = res_hnb.predict(res_hnb.model.exog[:10], which="prob", y_values=np.arange(30))
probs.shape, probs.sum(1)
((10, 30), array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]))
cdf = probs.cumsum(1)
n = cdf.shape[0]
cdf = np.column_stack((cdf, np.ones(n)))
n_repl = 10000
rvs = []
for i in range(n_repl):
u = rng.random((n, 1))
rvs.append(np.argmin(cdf < u, axis=1))
rvs = np.concatenate(rvs)
rvs.shape
(100000,)
just an internal helper function. I coded it a few times but only in experimental code
Should support simulating a regression model for count data, where we don't have a good rvs, e.g. generalized poisson, or truncated, hurdle versions.
It should be vectorized for nobs arrays with different parameters by nobs, i.e. different multinomial probabilities.
we can either draw a single cell number per obs, or cumulative counts.