pewresearch / pewmethods

Pew Research Center Methods team R package of miscellaneous functions
MIT License
190 stars 24 forks source link

Calibration failed error #5

Open chsuong opened 3 months ago

chsuong commented 3 months ago

Hello, my code is producing the error "! Calibration failed".

Below is the code snippet that produced the error. pop_long is the target dataset with the factor variables, country, Age (group), female. all is the dataset I'm trying to rake and includes the 3 variables.

targets <- create_raking_targets(pop_long, vars = c("country", "Age", "female"), wt = "frac")

all_raking <- all %>% mutate(rk_country = dk_to_na(country), rk_Age = dk_to_na(Age), rk_female=dk_to_na(female))

all_imputed <- impute_vars(all_raking, seed = 739)

No input to toimpute argument found. Imputing variables with prefix rk by default.

iter imp variable 1 1 rk_Age rk_female 2 1 rk_Age rk_female 3 1 rk_Age rk_female 4 1 rk_Age rk_female 5 1 rk_Age rk_female

all_raked <- all %>% dplyr::mutate(weight2 = rake_survey(all_imputed, pop_margins = targets))

Error in dplyr::mutate(): ℹ In argument: weight2 = rake_survey(all_imputed, pop_margins = targets). Caused by error in calibrate.survey.design2(): ! Calibration failed Backtrace:

  1. all %>% ...
  2. pewmethods::rake_survey(all_imputed, pop_margins = targets)
    1. survey:::calibrate.survey.design2(...)
    2. base::stop("Calibration failed")

Here is my R version:

R.Version() $platform [1] "aarch64-apple-darwin20"

$arch [1] "aarch64"

$os [1] "darwin20"

$system [1] "aarch64, darwin20"

$status [1] ""

$major [1] "4"

$minor [1] "4.1"

$year [1] "2024"

$month [1] "06"

$day [1] "14"

$svn rev [1] "86737"

$language [1] "R"

$version.string [1] "R version 4.4.1 (2024-06-14)"

$nickname [1] "Race for Your Life"

arnoldlcl commented 3 months ago

Hello,

This error appears to originate from the survey package that rake_survey is wrapping around, so my first couple of thoughts would be to check whether country, Age and female have the same categories between the all dataset and in the targets, and if so, whether the cells created by those categories are of a sufficient size to conduct raking. If, say, the data does not contain any observations for a particular country, or contains only a handful of observations, the raking algorithm may not converge, in which case you might need to collapse some countries together in order to create cells with around 30 or more cases or so.

If the cells are all a sufficient size, another possibility is that the marginal distributions from the dataset are so wildly far from the targets that the algorithm cannot converge. This sort of thing might occur if there was a coding error with the raking variables.