statistikat / simPop

Simulation of Synthetic Populations for Survey Data Considering Auxiliary Information
30 stars 7 forks source link

specify input without hhid #37

Open matthias-da opened 3 months ago

matthias-da commented 3 months ago

Not always there is a cluster structure (like person in households). Currently, one has to set hhid.

So,

Thus we can't produce synthetic data without cluster structure.

library(lavaan)
data(HolzingerSwineford1939)
library(simPop)
inp <- specifyInput(data=HolzingerSwineford1939, 
                                  strata="school", hhid = ???)

Is it worth to rewrite simPop from scratch to allow both: cluster structures and no cluster structures. In my opinion: yes, very worth to do it.

First task could be to use some data set to synthesize without receiving errors. Eg. one data set with age and sex structure but without weights and clusters

library(lavaan)
data(HolzingerSwineford1939)
library(simPop)
X$weight <- 1  # should not be needed
X$grade <- factor(X$grade) # should basically numeric input, but wont work
inp <- specifyInput(data = X, hhid = "school", weight = "weight", strata = "school") # should work without specifying hhid and weight
pop <- simStructure(data=inp, method="direct", basicHHvars=c("sex", "ageyr")) 
pop <- simCategorical(pop, additional = "grade", nr_cpus = 1) # error, also when grade is used as numeric with `simContinuos`.
pop <- simContinuous(pop, additional = "x1", nr_cpus = 1, method = "multinom") # errors for all methods