strengejacke / sjstats

Effect size measures and significance tests
https://strengejacke.github.io/sjstats
189 stars 21 forks source link

svyglm.nb() function hangs #112

Open sconti555 opened 2 years ago

sconti555 commented 2 years ago

Dear Daniel,

First and foremost I'd like to acknowledge you for your time and dedication in making available to the wider community your code and modelling facilities through your very handy 'sjstats' library.

I've been attempting to use your svyglm.nb() function to fit a survey-weighted Negative Binomial regression model to some data, invariably ending up with a hung code execution. I first tried to fit a relatively sizeable data-set (ca 750,000 cases, ca 80 predictors) separately with a variety of weights, to no avail: the code would just hang with no way to halt its execution (at least) in RStudio other than killing the RStudio task. Thinking that the issue may lie in an awkward distribution of weights I took a much smaller subset of the data (of ca 4,600 cases) and subsetted the weights accordingly (with the subset.survey.design() function); again to no avail and with no different code execution behaviour. The standard glm.nb() function from the MASS package instead fits the unweighted data (either the full or sub-sample) with no problem within reasonable run-time, whereas using svyglm.nb() on either a set of unit weights -- making it equivalent to the use of glm.nb() -- or any of the 4 sets of weights I'm exploring leads to me killing RStudio after over 24hrs run-times.

By looking at the code underlying your svyglm.nb() function I developed the impression that the issue may lie in over- / under-flow evaluations of the Digamma functions featuring the first derivatives of the Negative Binomial likelihood. I am writing to check whether you have experienced similar issues with the use of your svyglm.nb() function, and if so how (if) you managed to circumvent them.

Many thanks in advance for any help you may be able to offer!

paige90 commented 2 years ago

Hello, I have exactly the same issue and my rstudio’s been running for hours with no hope for converging. Did you by any luck find a solution to this problem?

sconti555 commented 2 years ago

Hello, I have exactly the same issue and my rstudio’s been running for hours with no hope for converging. Did you by any luck find a solution to this problem?

Hi page90,

Unfortunately I haven't, and not having heard from the developer – whose code incidentally I don't believe to be necessarily faulty – on the subject I'm more inclined on trying a bootstrap-based approach, as outlined in Prof Lumley's monograph.

strengejacke commented 2 years ago

I have used the function in the past on a rather large survey dataset (SHARE), and had no problems. I can't say for sure why the function hangs at some point, especially since this also happens for you with a relatively small dataset.

paige90 commented 2 years ago

@sconti555 Hello, thank you for your reply! I tried the same command in stata and it did not obtain convergence. Although I had success in getting the results using zero inflated poisson in r. I think negative binomial is tricky for complex survey design.

sconti555 commented 2 years ago

@sconti555 Hello, thank you for your reply! I tried the same command in stata and it did not obtain convergence. Although I had success in getting the results using zero inflated poisson in r. I think negative binomial is tricky for complex survey design.

Thank you for your follow-up, @paige90. I reckon that in my case the problem is entirely numerical, and stemming from the Digamma function in the Information matrix. Weights aren't likely the issue for me, since settings a Uniform sampling design (with unit weights) I run into the same exact issue.

CharlyMarie commented 1 year ago

Hello everyone

I have exactly the same problem with an emailed survey:

Have any of you, like @sconti555, found the problem and/or a way of solving it?

Many thanks

Charly

sconti555 commented 1 year ago

Hi @CharlyMarie,

Unfortunately I had given up, since I was unable to overcome that same issue. Believing that it has to do with the underlying svymle() engine – I had tried recoding the Negative Binomial likelihood function to avoid it getting in a tailspin around (unavoidable, which makes it tricky) Digamma function evaluations, to no avail – I e-mailed Thomas Lumley about the problem, but don't recall receiving a response.

Best of luck with advancing this matter, which I'd love to see resolved.