pewresearch / pewmethods

Pew Research Center Methods team R package of miscellaneous functions
MIT License
190 stars 24 forks source link

rake_survey "Error: All elements of `fs` must be factors" #2

Closed okassi closed 4 years ago

okassi commented 4 years ago

rake_survey throws the error mentioned in the title. Simple reproducible example below (this follows https://medium.com/pew-research-center-decoded/weighting-survey-data-with-the-pewmethods-r-package-d040afb0d2c2):

 library(pewmethods)
fullpopulation <- data.frame(runif(1000), as.factor(ifelse(runif(1000) > 0.5, 'A', 'B')))
names(fullpopulation) <- c('x','weighting_variable') 

my_skewed_sample <- data.frame(runif(100), as.factor(ifelse(runif(1000) > 0.9, 'A', 'B')))
names(my_skewed_sample) <- c('x','weighting_variable') 

targets <- create_raking_targets(
  fullpopulation,
  vars = c('weighting_variable'),
  wt = 1
)

rake_survey(my_skewed_sample, pop_margins = targets)

I get the error: Error: All elements offsmust be factors

I managed to trace the error to unify_margins function called by rake_factors(). A simple workaround was to comment out the line where unify_margins is called. This works fine if factor levels in fullpopulation$weighting_variable and my_skewed_sample$weighting_variable match.

For reference, here is my R version:

> R.Version()
$platform
[1] "x86_64-apple-darwin15.6.0"

$arch
[1] "x86_64"

$os
[1] "darwin15.6.0"

$system
[1] "x86_64, darwin15.6.0"

$status
[1] ""

$major
[1] "3"

$minor
[1] "5.1"

$year
[1] "2018"

$month
[1] "07"

$day
[1] "02"

$`svn rev`
[1] "74947"

$language
[1] "R"

$version.string
[1] "R version 3.5.1 (2018-07-02)"

$nickname
[1] "Feather Spray"

And here's my system info:

> Sys.info()
                                                                                            sysname
                                                                                           "Darwin"
                                                                                            release
                                                                                           "18.7.0"
                                                                                            version
"Darwin Kernel Version 18.7.0: Mon Feb 10 21:08:45 PST 2020; root:xnu-4903.278.28~1/RELEASE_X86_64"
                                                                                           nodename
                                                                                "MacBook-Air.local"
                                                                                            machine
                                                                                           "x86_64"
arnoldlcl commented 4 years ago

Hi, thanks for asking! The source of the error you’re getting is because the variable name in my_skewed_sample doesn’t match the variable name in the targets. This is because create_raking_targets, by default, appends an “rk_” prefix to the variable name when it creates the list of targets, and rake_survey was thus also expecting your variable name to be called rk_weighting_variable. This can easily be remedied two ways: either by setting the prefix argument in create_raking_targets to “”, or by changing the name of the variable in my_skewed_sample to rk_weighting_variable.

The error message isn't very informative in that respect, so we'll push a quick update soon to explicitly tell the user when the variable names don't match.

okassi commented 4 years ago

After a second look at ?create_raking_targets this feature seems to be well explained in the documentation. Cheers, and thanks for a quick reply!