ncn-foreigners / nonprobsvy

An R package for modern methods for non-probability surveys
https://ncn-foreigners.github.io/nonprobsvy/
Other
29 stars 4 forks source link

Large sample size problems #17

Closed BERENZ closed 8 months ago

BERENZ commented 12 months ago

If I generate a large non-probability sample I have problem with estimating models.

library(data.table)
library(nonprobsvy)
set.seed(2023-10-19)
n_reps <- 50
N <- 100000
n <- 1000
x1 <- rnorm(N,1,1)
x2 <- rexp(N,1)
alp <- rnorm(N)
epsilon <- rnorm(N)
y11 <- 1 + x1 + x2 + alp + epsilon
y12 <- 0.5*(x1-1.5)^2 + x2^2 + alp + epsilon
y21 <- rbinom(N,1,plogis(1 + x1 + x2 + alp))
y22 <- rbinom(N,1,plogis(0.5*(x1-1.5)^2 + x2^2 + alp))
p1 <- plogis(x2)
p2 <- plogis(-3+(x1-1.5)^2+(x2-2)^2)
pop_data <- data.frame(x1,x2,y11,y12,y21,y22,p1,p2) |> setDT()

sample_prob <- pop_data[sample(1:N, n),]
sample_prob$w <- N/n
sample_prob_svy <- svydesign(ids=~1, weights = ~w, data = sample_prob)
sample_bd1 <- pop_data[rbinom(N,1,pop_data$p1)==1, ]
sample_bd1$w_naive <- N/nrow(sample_bd1)
sample_bd2 <- pop_data[rbinom(N,1,pop_data$p2)==1, ]
sample_bd2$w_naive <- N/nrow(sample_bd2)

IPW (and thus DR)


## ipw h=1
test1 <- nonprob(selection = ~ x1 + x2, 
                  target = ~ y11 + y12 + y21 + y22, 
                  svydesign = sample_prob_svy,
                  data = sample_bd1,  
                  control_selection = controlSel(est_method_sel = "gee", h = 1))

> test1 <- nonprob(selection = ~ x1 + x2, 
+                  target = ~ y11 + y12 + y21 + y22, 
+                  svydesign = sample_prob_svy,
+                  data = sample_bd1,  
+                  control_selection = controlSel(est_method_sel = "gee", h = 1))
Error in nleqslv::nleqslv(x = start0, fn = u_theta, method = "Newton",  : 
  initial value of fn function contains non-finite values (starting at index=1)
  Check initial x and/or correctness of function
In addition: Warning messages:
1: glm.fit: algorithm did not converge 
2: glm.fit: fitted probabilities numerically 0 or 1 occurred 

## ipw mle
test1 <- nonprob(selection = ~ x1 + x2, 
                  target = ~ y11 + y12 + y21 + y22, 
                  svydesign = sample_prob_svy,
                  data = sample_bd1)

> test1 <- nonprob(selection = ~ x1 + x2, 
+                  target = ~ y11 + y12 + y21 + y22, 
+                  svydesign = sample_prob_svy,
+                  data = sample_bd1)
Error in solve.default(-hess) : 
  Lapack routine dgesv: system is exactly singular: U[1,1] = 0
In addition: Warning messages:
1: glm.fit: algorithm did not converge 
2: glm.fit: fitted probabilities numerically 0 or 1 occurred 
LukaszChrostowski commented 12 months ago

Estimation works with cloglog and probit methods specified. The error occurs for models based on logit link function..

LukaszChrostowski commented 11 months ago
Kertoo commented 8 months ago

Closing as completed