rOpenGov / eurostat

R tools for Eurostat data
http://ropengov.github.io/eurostat
Other
234 stars 46 forks source link

R CMD check test failure #279

Closed pitkant closed 8 months ago

pitkant commented 11 months ago

In current httr2 branch there seems to be a test failure in test_03_get_eurostat_geospatial:

══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('test_03_get_eurostat_geospatial.R:121:3'): get_eurostat_geospatial df ──
Error in `if (!setting_geom) {
    ix = if (is.character(i)) 
        which(i == names(x))
    else i
    if (is.null(value)) 
        agr = agr[-ix]
    else {
        if (length(ix) == 0 || ix > length(names(x))) 
            agr = st_agr(c(as.character(agr), NA_character_))
        else agr[ix] = NA
    }
}`: missing value where TRUE/FALSE needed
Backtrace:
     ▆
  1. ├─testthat::expect_message(...) at test_03_get_eurostat_geospatial.R:121:2
  2. │ └─testthat:::expect_condition_matching(...)
  3. │   └─testthat:::quasi_capture(...)
  4. │     ├─testthat (local) .capture(...)
  5. │     │ └─base::withCallingHandlers(...)
  6. │     └─rlang::eval_bare(quo_get_expr(.quo), quo_get_env(.quo))
  7. └─eurostat::get_eurostat_geospatial(nuts_level = "all", output_class = "df")
  8.   ├─sf::st_drop_geometry(shp)
  9.   └─sf:::st_drop_geometry.sf(shp)
 10.     └─sf::st_set_geometry(x, NULL)
 11.       ├─sf::`st_geometry<-`(`*tmp*`, value = value)
 12.       └─sf:::`st_geometry<-.sf`(`*tmp*`, value = value)
 13.         ├─base::`[[<-`(`*tmp*`, attr(x, "sf_column"), value = `<NULL>`)
 14.         └─sf:::`[[<-.sf`(`*tmp*`, attr(x, "sf_column"), value = `<NULL>`)

Specifically this seems to be related to this part of the code:

https://github.com/rOpenGov/eurostat/blob/327964ab096d9207b8e669fe74773d8fb11987a6/R/get_eurostat_geospatial.R#L231:L242

The sf::st_drop_geometry(shp) was complemented with shp$geometry <- NULL in PR #264 . I'm not sure if the problem has existed until that moment.

Something similar was reported in sf packages GitHub issues but this was supposed to be fixed back in 2019: https://github.com/r-spatial/sf/issues/1057

This occurs only when building and running the check --as-cran from the terminal. In RStudio checking the package --as-cran works fine for some reason.

Any ideas why this is happening? @dieghernan

dieghernan commented 11 months ago

Hi @pitkant

When #264 was introduced, these lines:

https://github.com/rOpenGov/eurostat/blob/327964ab096d9207b8e669fe74773d8fb11987a6/R/get_eurostat_geospatial.R#L234-L237

were a little bit odd to me. This is because shp$geometry <- NULL is still relying on sf under the hood, so although is under !requireNamespace("sf", quietly = TRUE) inf fact uses sf.

The error you are getting are in fact coming from sf, see:

https://github.com/r-spatial/sf/blob/acaf33ed7b2ec4de965196daea7e001657c6e39a/R/sf.R#L383-L404

I would go for another approach for dropping the sf-specific attributes by:

  1. Removing the sf class of the object
  2. After that, unselecting the columng with the geometry. Note that step 1 is needed since the sf geometry is sticky.

I would try this alternative implementation, see:

library(eurostat)
#> Warning: package 'eurostat' was built under R version 4.1.3

data("eurostat_geodata_60_2016")

shp <- eurostat_geodata_60_2016

# Start alternative -----
# Remove geometry without sf package

# New classes, set tibble as base for classes
new_class <- intersect(class(shp), class(tibble::tibble()))

# Identify sf column with geometry for removal
geom_col <- attr(shp, "sf_column")

# Unclass and remove specific sf columns
class(shp) <- new_class

shp <- shp[, setdiff(names(shp), geom_col)]
# End alternative ----

# Contrast
class(shp)
#> [1] "data.frame"
names(shp)
#>  [1] "id"         "LEVL_CODE"  "NUTS_ID"    "CNTR_CODE"  "NAME_LATN" 
#>  [6] "NUTS_NAME"  "MOUNT_TYPE" "URBN_TYPE"  "COAST_TYPE" "FID"       
#> [11] "geo"

# Check if same result as sf::st_drop_geometry()
with_sf <- sf::st_drop_geometry(shp)
identical(with_sf, shp)
#> [1] TRUE

Created on 2023-10-09 with reprex v2.0.2

pitkant commented 11 months ago

Thank you! I have to admit I was sloppy by not checking properly whether shp$geometry <- NULL was relying on sf package under the hood. If you can make a PR please do but I can also take responsibility for implementing the fixes, especially after you provided such excellent code example to start with.

pitkant commented 11 months ago

Actually now that I was testing this it seems that the new geo_names function added in commit 3a6e48f is messing with attr(shp, "sf_column"):

> shp <- eurostat::eurostat_geodata_60_2016
> attr(shp, "sf_column")
[1] "geometry"
> shp2 <- eurostat::eurostat_geodata_60_2016
> shp2 <- eurostat:::geo_names(shp2)
> attr(shp2, "sf_column")
NULL

While it's nice that there's a specific order for columns I can't actually recall what was the rationale for adding this function in the package.

dieghernan commented 11 months ago

Hi @pitkant

The function was introduced after this comment https://github.com/rOpenGov/eurostat/pull/264#issuecomment-1657999888 and issue #240

What I have found is that geo_names() only work properly if sf is attached. The problem comes with the reordering of columns, and I and not finding a solution. Is it possible to relax that requirement?

pitkant commented 11 months ago

Yes, although it is a neat function in my opinion and adds a consistent and predictable structure to geospatial objects. I would lean towards making all geospatial functions in eurostat package depend on having sf package installed. It would certainly make our job much easier. How much value does it add to read the singular 2016 file from data folder as a data.frame object? Sure it can be used to get some regional ID codes and region names, but other than that?

dieghernan commented 11 months ago

Agree, I would do the changes requiring sf so we can avoid workarounds. One question, in which branch? This is also in v4-dev

pitkant commented 11 months ago

v4-dev would be perfect, yes. I will merge httr2 branch there as well soon and if all tests pass it will go to v4 (and CRAN release) and after CRAN release into master branch.

pitkant commented 10 months ago

I now get the following errors in get_eurostat_geospatial:


══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('test_03_get_eurostat_geospatial.R:110:3'): get_eurostat_geospatial nuts levels ──
Error in `expect_compare(">", act, exp)`: Result of comparison must be a single logical value
Backtrace:
    ▆
 1. └─testthat::expect_gt(nrow(gall), nrow(gn3)) at test_03_get_eurostat_geospatial.R:110:2
 2.   └─testthat:::expect_compare(">", act, exp)
 3.     └─rlang::abort("Result of comparison must be a single logical value")
── Error ('test_03_get_eurostat_geospatial.R:192:3'): get_eurostat_geospatial df ──
Error in `expect_compare(">", act, exp)`: Result of comparison must be a single logical value
Backtrace:
    ▆
 1. └─testthat::expect_gt(nrow(gall), nrow(gn3)) at test_03_get_eurostat_geospatial.R:192:2
 2.   └─testthat:::expect_compare(">", act, exp)
 3.     └─rlang::abort("Result of comparison must be a single logical value")
── Failure ('test_03_get_eurostat_geospatial.R:318:3'): Check column names POLYGONS from GISCO ──
`poly` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:319:3'): Check column names POLYGONS from GISCO ──
names(poly) (`actual`) not identical to `col_order` (`expected`).

`actual` is NULL
`expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...)
── Failure ('test_03_get_eurostat_geospatial.R:327:3'): Check column names POLYGONS from GISCO ──
`poly_df` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:328:3'): Check column names POLYGONS from GISCO ──
names(poly_df) (`actual`) not identical to col_order[-length(col_order)] (`expected`).

`actual` is NULL
`expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...)
── Failure ('test_03_get_eurostat_geospatial.R:347:3'): Check column names POLYGONS from GISCO ──
`poly` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:348:3'): Check column names POLYGONS from GISCO ──
names(poly) (`actual`) not identical to `col_order` (`expected`).

`actual` is NULL
`expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...)
── Failure ('test_03_get_eurostat_geospatial.R:356:3'): Check column names POLYGONS from GISCO ──
`poly_df` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:357:3'): Check column names POLYGONS from GISCO ──
names(poly_df) (`actual`) not identical to col_order[-length(col_order)] (`expected`).

`actual` is NULL
`expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...)
── Failure ('test_03_get_eurostat_geospatial.R:543:3'): Check column names BORDERS from GISCO ──
`poly` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:552:3'): Check column names BORDERS from GISCO ──
`poly_df` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:576:3'): Check column names BORDERS from GISCO ──
`poly` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:585:3'): Check column names BORDERS from GISCO ──
`poly_df` is not an S3 object

[ FAIL 14 | WARN 0 | SKIP 0 | PASS 182 ]
Error: Test failures
Execution halted

1 error ✖ | 0 warnings ✔ | 0 notes ✔
Error: R CMD check found ERRORs
Execution halted

These are related to gsc_api_cache function lines 25-26 producing err_dwnload object that inherits try-error and produces a NULL:

err_dwnload <- suppressWarnings(try(download.file(url, file.local, 
    quiet = isFALSE(verbose), mode = "wb"), silent = TRUE))

This is due to HTTP status 503 from GISCO:

Error in download.file(url, file.local, quiet = isFALSE(verbose), mode = "wb",  : 
  cannot open URL 'https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/NUTS_BN_60M_2010_4326_LEVL_0.geojson'
In addition: Warning messages:
1: In download.file(url, file.local, quiet = isFALSE(verbose), mode = "wb",  :
  downloaded length 0 != reported length 0
2: In download.file(url, file.local, quiet = isFALSE(verbose), mode = "wb",  :
  cannot open URL 'https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/NUTS_BN_60M_2010_4326_LEVL_0.geojson': HTTP status was '503 Service Unavailable'

That made me wonder if GISCO is using a DDoS protection system as the situation is similar to the situation this Stack Overflow post is describing: https://stackoverflow.com/a/47729625/14413481

@dieghernan Do you get similar errors when running R CMD check on v4-dev branch or is it just me?

dieghernan commented 10 months ago

Hi @pitkant , I received today an email from CRAN related with this, also some users are reporting this https://github.com/rOpenGov/giscoR/issues/69 See there

dieghernan commented 10 months ago

Hi @pitkant

New release of giscoR is now on CRAN, the error should be fixed now. Please allow a couple of days on the action since I think it uses posit package manager instead of CRAN, it may take some time until it updates giscoR to 0.4.0 https://packagemanager.posit.co/client/#/repos/cran/packages/giscoR/611842/overview?search=giscoR%23package-details

hannesaddec commented 10 months ago

I now get the following errors in get_eurostat_geospatial:


══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('test_03_get_eurostat_geospatial.R:110:3'): get_eurostat_geospatial nuts levels ──
Error in `expect_compare(">", act, exp)`: Result of comparison must be a single logical value
Backtrace:
    ▆
 1. └─testthat::expect_gt(nrow(gall), nrow(gn3)) at test_03_get_eurostat_geospatial.R:110:2
 2.   └─testthat:::expect_compare(">", act, exp)
 3.     └─rlang::abort("Result of comparison must be a single logical value")
── Error ('test_03_get_eurostat_geospatial.R:192:3'): get_eurostat_geospatial df ──
Error in `expect_compare(">", act, exp)`: Result of comparison must be a single logical value
Backtrace:
    ▆
 1. └─testthat::expect_gt(nrow(gall), nrow(gn3)) at test_03_get_eurostat_geospatial.R:192:2
 2.   └─testthat:::expect_compare(">", act, exp)
 3.     └─rlang::abort("Result of comparison must be a single logical value")
── Failure ('test_03_get_eurostat_geospatial.R:318:3'): Check column names POLYGONS from GISCO ──
`poly` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:319:3'): Check column names POLYGONS from GISCO ──
names(poly) (`actual`) not identical to `col_order` (`expected`).

`actual` is NULL
`expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...)
── Failure ('test_03_get_eurostat_geospatial.R:327:3'): Check column names POLYGONS from GISCO ──
`poly_df` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:328:3'): Check column names POLYGONS from GISCO ──
names(poly_df) (`actual`) not identical to col_order[-length(col_order)] (`expected`).

`actual` is NULL
`expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...)
── Failure ('test_03_get_eurostat_geospatial.R:347:3'): Check column names POLYGONS from GISCO ──
`poly` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:348:3'): Check column names POLYGONS from GISCO ──
names(poly) (`actual`) not identical to `col_order` (`expected`).

`actual` is NULL
`expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...)
── Failure ('test_03_get_eurostat_geospatial.R:356:3'): Check column names POLYGONS from GISCO ──
`poly_df` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:357:3'): Check column names POLYGONS from GISCO ──
names(poly_df) (`actual`) not identical to col_order[-length(col_order)] (`expected`).

`actual` is NULL
`expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...)
── Failure ('test_03_get_eurostat_geospatial.R:543:3'): Check column names BORDERS from GISCO ──
`poly` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:552:3'): Check column names BORDERS from GISCO ──
`poly_df` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:576:3'): Check column names BORDERS from GISCO ──
`poly` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:585:3'): Check column names BORDERS from GISCO ──
`poly_df` is not an S3 object

[ FAIL 14 | WARN 0 | SKIP 0 | PASS 182 ]
Error: Test failures
Execution halted

1 error ✖ | 0 warnings ✔ | 0 notes ✔
Error: R CMD check found ERRORs
Execution halted

These are related to gsc_api_cache function lines 25-26 producing err_dwnload object that inherits try-error and produces a NULL:

err_dwnload <- suppressWarnings(try(download.file(url, file.local, 
    quiet = isFALSE(verbose), mode = "wb"), silent = TRUE))

This is due to HTTP status 503 from GISCO:

Error in download.file(url, file.local, quiet = isFALSE(verbose), mode = "wb",  : 
  cannot open URL 'https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/NUTS_BN_60M_2010_4326_LEVL_0.geojson'
In addition: Warning messages:
1: In download.file(url, file.local, quiet = isFALSE(verbose), mode = "wb",  :
  downloaded length 0 != reported length 0
2: In download.file(url, file.local, quiet = isFALSE(verbose), mode = "wb",  :
  cannot open URL 'https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/NUTS_BN_60M_2010_4326_LEVL_0.geojson': HTTP status was '503 Service Unavailable'

That made me wonder if GISCO is using a DDoS protection system as the situation is similar to the situation this Stack Overflow post is describing: https://stackoverflow.com/a/47729625/14413481

@dieghernan Do you get similar errors when running R CMD check on v4-dev branch or is it just me?

DM me on technical infrastructure issues..

pitkant commented 10 months ago

Thank you for your message @hannesaddec , I will do so in the future.

@dieghernan I ran the tests with new version of giscoR, everything worked great

pitkant commented 8 months ago

Closed with the CRAN release of package version 4.0.0