spsanderson / healthyR.data

Data sets for the healthyR package.
https://www.spsanderson.com/healthyR.data/
Other
9 stars 3 forks source link

Get `tempdir()` does not exist error #72

Closed spsanderson closed 1 year ago

spsanderson commented 1 year ago

instead of using 'tempdir()` use utils::choose.dir() intead.

New function:

#' Download Current Hospital Data Files.
#'
#' @family Hospital Data
#'
#' @author Steven P. Sanderson II, MPH
#'
#' @seealso \url{https://data.cms.gov/provider-data/topics/hospitals/}
#'
#' @description Download the current Hospital Data Sets.
#'
#' @details This function will download the current and the official hospital
#' data sets from the __CMS.gov__ website.
#'
#' The function makes use of a temporary directory and file to save and unzip
#' the data. This will grab the current Hospital Data Files, unzip them and
#' return a list of tibbles with each tibble named after the data file.
#'
#' The function returns a list object with all of the current hospital data as a
#' tibble. It does not save the data anywhere so if you want to save it you will
#' have to do that manually.
#'
#' This also means that you would have to store the data as a variable in order
#' to access the data later on. It does have a given attributes and a class so
#' that it can be piped into other functions.
#'
#' @examples
#' \dontrun{
#'   current_hosp_data()
#' }
#'
#' @return
#' Downloads the current hospital data sets.
#'
#' @name current_hosp_data
NULL

#' @export
#' @rdname current_hosp_data

current_hosp_data <- function() {

  # URL for file
  url <- "https://data.cms.gov/provider-data/sites/default/files/archive/Hospitals/current/hospitals_current_data.zip"

  # Create a temporary directory to process the zip file
  tmp_dir <- utils::choose.dir()#tempdir()
  download_location <- file.path(tmp_dir, "download.zip")
  extract_location <- file.path(tmp_dir, "extract")

  # Download the zip file to the temporary location
  utils::download.file(
    url = url,
    destfile = download_location
  )

  # Unzip the file
  utils::unzip(download_location, exdir = extract_location)

  # Read the csv files into a list
  csv_file_list <- list.files(
    path = extract_location,
    pattern = "\\.csv$",
    full.names = TRUE
  )

  # make named list
  csv_names <- stats::setNames(
      object = csv_file_list,
      nm =
        csv_file_list |>
        basename() |>
        gsub(pattern = "\\.csv$", replacement = "") |>
        janitor::make_clean_names()
    )

  # Process CSV Files
  parse_csv_file <- function(file) {
    # Normalize the path to use C:/path/to/file structure
    normalizePath(file, "/") |>
      # read in the csv file and use check.names = FALSE because some of
      # the names are very long
      utils::read.csv(check.names = FALSE) |>
      dplyr::as_tibble() |>
      # clean the field names
      janitor::clean_names()
  }

  list_of_tables <- lapply(csv_names, parse_csv_file)

  unlink(tmp_dir, recursive = TRUE)

  # Return the tibbles)
  # Add and attribute and a class type to the object
  attr(list_of_tables, ".list_type") <- "current_hosp_data"
  class(list_of_tables) <- c("current_hosp_data", class(list_of_tables))

  list_of_tables
}

@rjake would you have the bandwidth to confirm this one?

rjake commented 1 year ago

Yes, I can. I think the error is due to the directory path extract_location not existing. Do you think we should make it an argument? A temporary directory is not a bad place to put this. I'll sketch it out and you can let me know what you think/

I also think we should add a ... to unzip() so you can do unzip(list = TRUE) or unzip(files = "ASC_Facility.csv") and get just the files you need.


current_hosp_data <- function(path = tempdir(), ...) {
  utils::unzip(download_location, ...)
}
spsanderson commented 1 year ago

Thanks for taking a look at this. I think we could make an argument to allow someone to pull a file out, they would just need to know the name or we would need to implement some grep() to it, which is similar to what I do with other functions like current_asc_data() seen here: https://github.com/spsanderson/healthyR.data/blob/HEAD/R/get-cur-asc-data.R

On Tue, May 9, 2023 at 10:14 AM Jake @.***> wrote:

Yes, I can. I think the error is due to the directory path extract_location not existing. Do you think we should make it an argument? A temporary directory is not a bad place to put this. I'll sketch it out and you can let me know what you think/

I also think we should add a ... to unzip() so you can do unzip(list = TRUE) or unzip(files = "ASC_Facility.csv") and get just the files you need.

current_hosp_data <- function(path = tempdir(), ...) { utils::unzip(download_location, ...) }

— Reply to this email directly, view it on GitHub https://github.com/spsanderson/healthyR.data/issues/72#issuecomment-1540211954, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPCNS4J76GE4IZAOVUFQ43XFJGKTANCNFSM6AAAAAAX26TNH4 . You are receiving this because you authored the thread.Message ID: @.***>

-- Steven P Sanderson II, MPH Book on Lulu http://goo.gl/lmrlFI Personal Site http://www.spsanderson.com