ropensci / comtradr

Functions for Interacting with the UN Comtrade API
https://docs.ropensci.org/comtradr
64 stars 17 forks source link

Unexpected behaviour: tidy_cols = FALSE #88

Closed FATelarico closed 2 months ago

FATelarico commented 3 months ago

ReprEx:

comtradr::ct_get_data(
    # Technical parameters
    tidy_cols = FALSE,
    # Countries
    reporter ='ALB', partner = 'USA',
    # Period
    start_date = '2021', end_date = '2022',
)

Expected behaviour

Return a data frame with raw names:

typeCode freqCode refPeriodId refYear refMonth period reporterCode reporterISO reporterDesc flowCode flowDesc partnerCode partnerISO partnerDesc partner2Code partner2ISO partner2Desc classificationCode classificationSearchCode isOriginalClassification cmdCode cmdDesc aggrLevel isLeaf customsCode customsDesc mosCode motCode motDesc qtyUnitCode qtyUnitAbbr qty isQtyEstimated altQtyUnitCode altQtyUnitAbbr altQty isAltQtyEstimated netWgt isNetWgtEstimated grossWgt isGrossWgtEstimated cifvalue fobvalue primaryValue legacyEstimationFlag isReported isAggregate
C A 20210101 2021 52 2021 8 ALB Albania M Import 842 USA USA 0 W00 World H5 HS TRUE TOTAL All Commodities 0 FALSE C00 TOTAL CPC 0 0 TOTAL MOT -1 N/A 0 FALSE -1 N/A 0 FALSE 0 FALSE 0 FALSE 141893262 NA 141893262 0 FALSE TRUE
C A 20210101 2021 52 2021 8 ALB Albania X Export 842 USA USA 0 W00 World H5 HS TRUE TOTAL All Commodities 0 FALSE C00 TOTAL CPC 0 0 TOTAL MOT -1 N/A 0 FALSE -1 N/A 0 FALSE 0 FALSE 0 FALSE NA 47686560 47686560 0 FALSE TRUE
C A 20220101 2022 52 2022 8 ALB Albania M Import 842 USA USA 0 W00 World H6 HS TRUE TOTAL All Commodities 0 FALSE C00 TOTAL CPC 0 0 TOTAL MOT -1 N/A 0 FALSE -1 N/A 0 FALSE 0 FALSE 0 FALSE 78516088 NA 78516088 0 FALSE TRUE
C A 20220101 2022 52 2022 8 ALB Albania X Export 842 USA USA 0 W00 World H6 HS TRUE TOTAL All Commodities 0 FALSE C00 TOTAL CPC 0 0 TOTAL MOT -1 N/A 0 FALSE -1 N/A 0 FALSE 0 FALSE 0 FALSE NA 10102551 10102551 0 FALSE TRUE

Registred behaviour

NULL

Suggested fix

ct_process_response <- function (resp, verbose = FALSE, tidy_cols, bulk){
  if (bulk) {
    if (!dir.exists(tools::R_user_dir("comtradr_bulk", which = "cache"))) {
      dir.create(tools::R_user_dir("comtradr_bulk", which = "cache"), 
                 recursive = TRUE)
    }
    filename <- stringr::str_replace_all(stringr::str_remove(stringr::str_remove(httr2::resp_header(resp, 
                                                                                                    "Content-Disposition"), ".*filename=\""), "\".*"), 
                                         "/", "_")
    writeBin(httr2::resp_body_raw(resp), con = file.path(tools::R_user_dir("comtradr_bulk", 
                                                                           which = "cache"), filename))
    processed <- readr::read_delim(file.path(tools::R_user_dir("comtradr_bulk", 
                                                               which = "cache"), filename), delim = "\t", show_col_types = FALSE, 
                                   progress = FALSE, guess_max = 99999, col_types = readr::cols(.default = "c"))
    file.remove(file.path(tools::R_user_dir("comtradr_bulk", 
                                            which = "cache"), filename))
  }
  else {
    result <- httr2::resp_body_json(resp, simplifyVector = TRUE)
    if (length(result$data) > 0) {
      if (nrow(result$data) == 1e+05) {
        cli::cli_warn(c(x = "Your request returns exactly 100k rows. This means that most likely not all the data you queried has been returned, as the upper limit without subscription is 100k. Please partition your API call, e.g. by using only half the period in the first call."))
      }
      else if (nrow(result$data) > 90000) {
        cli::cli_inform(c(i = "Your request has passed 90k rows. If you exceed 100k rows Comtrade will not return all data. You will have to slice your request in smaller parts."))
      }
      processed <- result$data
    }
    else {
      return(data.frame(count = 0))
    }
  }
  new_cols <- comtradr::ct_pretty_cols
  if (tidy_cols == TRUE) {
    stopifnot(is.data.frame(processed))
    curr_cols <- colnames(processed)
    if (!all(curr_cols %in% new_cols$from)) {
      err <- paste(curr_cols[!curr_cols %in% new_cols$from], 
                   collapse = ", ")
      rlang::abort(paste("The following col headers within input df are not found in", 
                         "the pkg data obj 'ct_pretty_cols':", err))
    }
    colnames(processed) <- purrr::map_chr(curr_cols, function(x) {
      new_cols$to[which(new_cols$from == x)]
    })
  }
  attributes(processed)$url <- resp$url
  attributes(processed)$time <- Sys.time()
  return(processed)
}
datapumpernickel commented 3 months ago

Hi Fabio!

thanks for noticing and reporting this bug. I have fixed it. In the next release on CRAN it will be included. Would you mind testing with the development version on GitHub in the meantime?

You can install with:

# install.packages("devtools")
devtools::install_github("ropensci/comtradr@main")

and then re-run your command.

Thanks!

FATelarico commented 2 months ago

Works like a charm. Thank you! And glad to be of help.