kosukeimai / wru

Who Are You? Bayesian Prediction of Racial Category Using Surname and Geolocation
132 stars 31 forks source link

Setting `wru_data_wd` option won't work as expected #157

Open federiva opened 1 month ago

federiva commented 1 month ago

Hi! Thank you all for this great package!

I've spotted an error (I think) while using the package

:information_source: Issue: Setting the wru_data_wd option doesn't seem to work as expected

:notebook: Context: I'm expecting that by means of setting up the wru_data_wd option in the following way:

options(wru_data_wd = "my_path_to_a_local_folder")

I'll be downloading the rds files to a local folder, more specifically to "my_path_to_a_local_folder" in the toy example above

:warning: Error:

  1. After setting the option and running the predict_race function I'm getting the following error
    Downloading "wru-data-census_last_c.rds"...
    Error in curl::curl_fetch_disk(url, x$path, handle = handle): Failed to open file NA/wru-data-census_last_c.rds.

    Also I tried running the private function manually

    wru:::wru_data_preflight()

    Which is raising the same error mentioned above

federiva commented 1 month ago

I think that the error is due to the following line

dest <- ifelse(getOption("wru_data_wd", default = FALSE), getwd(), tempdir())

the ifelse above is checking that the getOption above is either TRUE or FALSE so therefore when setting the wru_data_wd option is returning a character instead of FALSE and the ifelse function silently returns an NA instead of TRUE or FALSE (therefore the error above trying to set the path to NA/wru-data...

To be honest I do not fully understand the logic above that could be fixed by forcing the data type on the arg passed to ifelse, for example:

dest <- ifelse(isFALSE(getOption("wru_data_wd", default = FALSE)), getwd(), tempdir())

But that is misleading as it will always return tempdir() when setting a path and if we set it up to TRUE then it will return the output of getwd() which is also confusing unless properly documented.

I suggest the following change to that line (which BTW is in other places in the codebase)

# When option is not set it falls back to tempdir()
dest <- getOption("wru_data_wd", default = tempdir())
# Check if dir exists
if (!dir.exists(dest)) {
  warning(paste("somewarning message for the user"))
  dest <- tempdir()
}
# Rest of the function

Tell me what you think of it and if you agree I can submit a PR with the fix

Thanks again for this package! :slightly_smiling_face:

Edit: :warning: ifelse is always bug-prone there are other inline alternatives like dplyr's if_else or simply an inline if else like if (my_condition) TRUE else FALSE. For the above mentioned line I'd also suggest a refactor (extract to function) as the same line is repeated four times, something like :nerd_face:

get_wru_working_path <- function() {
  dest <- getOption("wru_data_wd", default = tempdir())
  # Check if dir exists
  if (!dir.exists(dest)) {
    warning(paste("somewarning message for the user"))
    dest <- tempdir()
 }
dest
}
1beb commented 1 month ago

The option uses your current working directory. It's less than ideal and we should definitely give people the option to set a directory to lookup the files.

Presently, it uses your current working directory if the option (wru_data_wd, wru data workding directory) is TRUE or it uses a temporary directory if it is FALSE. Notably, temporary directories only survive the session. It defaults to a temporary directory if the option is not set.

Please consider a PR if you are interested in changing the behavior in a way that may benefit others.