ipeaGIT / aopdata

Download data from the Access to Opportunities Project (AOP)
https://ipeagit.github.io/aopdata/
13 stars 4 forks source link

Internet resources should fail gracefully #34

Closed rafapereirabr closed 2 years ago

rafapereirabr commented 2 years ago

This again:

'Packages which use Internet resources should fail gracefully with an informative message
if the resource is not available or has changed (and not give a check warning nor error).'
rafapereirabr commented 2 years ago

This is the new version of the check_connection() function I'm proposing. This version now returns TRUE if url is working, and FALSE if not.

This version seems much more robust, but it will require small changes in how we call check_connection() inside other functions.


#' Check internet connection with Ipea server
#'
#' @description
#' Checks if there is internet connection with Ipea server to download aop data.
#'
#' @param file_url A string with the file_url address of an aop dataset
#'
#' @return Logical. `TRUE` if url is working, `FALSE` if not.
#'
#' @export
#' @family support functions
#'
check_connection <- function(file_url = 'https://www.ipea.gov.br/geobr/aopdata/metadata/metadata.csv'){

  # file_url <- 'http://google.com/'               # ok
  # file_url <- 'http://www.google.com:81/'   # timeout
  # file_url <- 'http://httpbin.org/status/300' # error

  # check if user has internet connection
  if (!curl::has_internet()) { message("No internet connection.")
    return(FALSE)
  }

  # message
  msg <- "Problem connecting to data server. Please try geobr again in a few minutes."

  # test server connection
  x <- try(silent = TRUE,
           httr::GET(file_url, # timeout(5),
                     config = httr::config(ssl_verifypeer = FALSE)))
  # link offline
  if (class(x)=="try-error") {
    message( msg )
    return(FALSE)
  }

  # link working fine
  else if ( identical(httr::status_code(x), 200L)) {
    return(TRUE)
  }

  # link not working or timeout
  else if (! identical(httr::status_code(x), 200L)) {
    message(msg )
    return(FALSE)

  } else if (httr::http_error(x) == TRUE) {
    message(msg)
    return(FALSE)
  }

}
dhersz commented 2 years ago

Honestly, I feel it's just better to always return errors if the internet is not available and skip all tests and examples on CRAN, so they don't complain about it.

rafapereirabr commented 2 years ago

From a practical point of view, halting the function with a message and throwing an error are the same thing, imho. The only difference is that halting the function with a message requires a much more careful code from our side.

dhersz commented 2 years ago

I see two problems with this approach:

  1. It's error prone, and if you don't "fail gracefully" often enough the package may be archived (e.g. geobr);
  2. It may lead to error further down the code that do not point to the actual source of problem. Let's say you want to use geobr's spatial data to do some analysis, but first you need to transform the CRS. If the downloading function returns a logical, it will result in an error when trying to transform the object anyway, so it's better to raise an error when the function that is supposed to do something doesn't. Example:
an_object <- read_state("RJ") # let's suppose this fails and the function returns FALSE
an_object <- sf::st_transform(an_object, 4326) # will raise an error

# so IMO it's better to raise the error right away
an_object <- read_state("RJ") # should raise an error if the download fails
rafapereirabr commented 2 years ago

Hi Daniel. Sorry, I had just pushed an implementation of the solution and before I saw your latest comment here. The solution I suggested in my last push does exaclty what you're saying, it halts the function in the middle of it. So in your example, the read_state("RJ") would be halted with a message if the download fails.

I think this will solve the issue, but please feel free to have a look at the code and tests. If you have any suggestions, they would be much appreciated !

rafapereirabr commented 2 years ago

Here is a mock example of how I thought about approaching the issue.

url_ok <- 'http://google.com/'
url_timeout <- 'http://www.google.com:81/'
url_error <- 'http://httpbin.org/status/300'

test <- function(x, url){

  check_con <- aopdata::check_connection(url)
  if(is.null(check_con) | isFALSE(check_con)){ return(invisible(NULL)) }

  x <- x+2
  return(x)
}

# this should work and return output.
test(2, url_ok ) 

# these should NOT work and return a message.
test(2, url_timeout )
test(2, url_error )

The problem is that it's working fine in this example, but it is not working in the aopdata package. For a test, I disconnect my computer from the internet and ran the following code. Then I get both a message AND an error. So it seems that the check connections is not halting the parent function properly.

library(aopdata)
d <- read_population(city = 'bel', year = 2010, geometry = F, showProgress = T)

No internet connection. Error in eval(e, x, parent.frame()) : object 'type' not found

dhersz commented 2 years ago

For reference, I think these are the best practices we should aim for: https://books.ropensci.org/http-testing/index.html

rafapereirabr commented 2 years ago

Perhaps in the future we can adopt a more robus approach, as suggested by Daniel. In the meantime, the current fix should address our shor-term problems with CRAN policies. Closing this issue for now.