Closed pitkant closed 11 months ago
Hi @pitkant
When #264 was introduced, these lines:
were a little bit odd to me. This is because shp$geometry <- NULL
is still relying on sf under the hood, so although is under !requireNamespace("sf", quietly = TRUE)
inf fact uses sf.
The error you are getting are in fact coming from sf, see:
https://github.com/r-spatial/sf/blob/acaf33ed7b2ec4de965196daea7e001657c6e39a/R/sf.R#L383-L404
I would go for another approach for dropping the sf-specific attributes by:
sf
class of the objectI would try this alternative implementation, see:
library(eurostat)
#> Warning: package 'eurostat' was built under R version 4.1.3
data("eurostat_geodata_60_2016")
shp <- eurostat_geodata_60_2016
# Start alternative -----
# Remove geometry without sf package
# New classes, set tibble as base for classes
new_class <- intersect(class(shp), class(tibble::tibble()))
# Identify sf column with geometry for removal
geom_col <- attr(shp, "sf_column")
# Unclass and remove specific sf columns
class(shp) <- new_class
shp <- shp[, setdiff(names(shp), geom_col)]
# End alternative ----
# Contrast
class(shp)
#> [1] "data.frame"
names(shp)
#> [1] "id" "LEVL_CODE" "NUTS_ID" "CNTR_CODE" "NAME_LATN"
#> [6] "NUTS_NAME" "MOUNT_TYPE" "URBN_TYPE" "COAST_TYPE" "FID"
#> [11] "geo"
# Check if same result as sf::st_drop_geometry()
with_sf <- sf::st_drop_geometry(shp)
identical(with_sf, shp)
#> [1] TRUE
Created on 2023-10-09 with reprex v2.0.2
Thank you! I have to admit I was sloppy by not checking properly whether shp$geometry <- NULL
was relying on sf
package under the hood. If you can make a PR please do but I can also take responsibility for implementing the fixes, especially after you provided such excellent code example to start with.
Actually now that I was testing this it seems that the new geo_names
function added in commit 3a6e48f is messing with attr(shp, "sf_column")
:
> shp <- eurostat::eurostat_geodata_60_2016
> attr(shp, "sf_column")
[1] "geometry"
> shp2 <- eurostat::eurostat_geodata_60_2016
> shp2 <- eurostat:::geo_names(shp2)
> attr(shp2, "sf_column")
NULL
While it's nice that there's a specific order for columns I can't actually recall what was the rationale for adding this function in the package.
Hi @pitkant
The function was introduced after this comment https://github.com/rOpenGov/eurostat/pull/264#issuecomment-1657999888 and issue #240
What I have found is that geo_names()
only work properly if sf
is attached. The problem comes with the reordering of columns, and I and not finding a solution. Is it possible to relax that requirement?
Yes, although it is a neat function in my opinion and adds a consistent and predictable structure to geospatial objects. I would lean towards making all geospatial functions in eurostat package depend on having sf package installed. It would certainly make our job much easier. How much value does it add to read the singular 2016 file from data folder as a data.frame object? Sure it can be used to get some regional ID codes and region names, but other than that?
Agree, I would do the changes requiring sf so we can avoid workarounds. One question, in which branch? This is also in v4-dev
v4-dev would be perfect, yes. I will merge httr2 branch there as well soon and if all tests pass it will go to v4 (and CRAN release) and after CRAN release into master branch.
I now get the following errors in get_eurostat_geospatial
:
══ Failed tests ════════════════════════════════════════════════════════════════
── Error ('test_03_get_eurostat_geospatial.R:110:3'): get_eurostat_geospatial nuts levels ──
Error in `expect_compare(">", act, exp)`: Result of comparison must be a single logical value
Backtrace:
▆
1. └─testthat::expect_gt(nrow(gall), nrow(gn3)) at test_03_get_eurostat_geospatial.R:110:2
2. └─testthat:::expect_compare(">", act, exp)
3. └─rlang::abort("Result of comparison must be a single logical value")
── Error ('test_03_get_eurostat_geospatial.R:192:3'): get_eurostat_geospatial df ──
Error in `expect_compare(">", act, exp)`: Result of comparison must be a single logical value
Backtrace:
▆
1. └─testthat::expect_gt(nrow(gall), nrow(gn3)) at test_03_get_eurostat_geospatial.R:192:2
2. └─testthat:::expect_compare(">", act, exp)
3. └─rlang::abort("Result of comparison must be a single logical value")
── Failure ('test_03_get_eurostat_geospatial.R:318:3'): Check column names POLYGONS from GISCO ──
`poly` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:319:3'): Check column names POLYGONS from GISCO ──
names(poly) (`actual`) not identical to `col_order` (`expected`).
`actual` is NULL
`expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...)
── Failure ('test_03_get_eurostat_geospatial.R:327:3'): Check column names POLYGONS from GISCO ──
`poly_df` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:328:3'): Check column names POLYGONS from GISCO ──
names(poly_df) (`actual`) not identical to col_order[-length(col_order)] (`expected`).
`actual` is NULL
`expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...)
── Failure ('test_03_get_eurostat_geospatial.R:347:3'): Check column names POLYGONS from GISCO ──
`poly` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:348:3'): Check column names POLYGONS from GISCO ──
names(poly) (`actual`) not identical to `col_order` (`expected`).
`actual` is NULL
`expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...)
── Failure ('test_03_get_eurostat_geospatial.R:356:3'): Check column names POLYGONS from GISCO ──
`poly_df` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:357:3'): Check column names POLYGONS from GISCO ──
names(poly_df) (`actual`) not identical to col_order[-length(col_order)] (`expected`).
`actual` is NULL
`expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...)
── Failure ('test_03_get_eurostat_geospatial.R:543:3'): Check column names BORDERS from GISCO ──
`poly` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:552:3'): Check column names BORDERS from GISCO ──
`poly_df` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:576:3'): Check column names BORDERS from GISCO ──
`poly` is not an S3 object
── Failure ('test_03_get_eurostat_geospatial.R:585:3'): Check column names BORDERS from GISCO ──
`poly_df` is not an S3 object
[ FAIL 14 | WARN 0 | SKIP 0 | PASS 182 ]
Error: Test failures
Execution halted
1 error ✖ | 0 warnings ✔ | 0 notes ✔
Error: R CMD check found ERRORs
Execution halted
These are related to gsc_api_cache
function lines 25-26 producing err_dwnload
object that inherits try-error
and produces a NULL:
err_dwnload <- suppressWarnings(try(download.file(url, file.local,
quiet = isFALSE(verbose), mode = "wb"), silent = TRUE))
This is due to HTTP status 503 from GISCO:
Error in download.file(url, file.local, quiet = isFALSE(verbose), mode = "wb", :
cannot open URL 'https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/NUTS_BN_60M_2010_4326_LEVL_0.geojson'
In addition: Warning messages:
1: In download.file(url, file.local, quiet = isFALSE(verbose), mode = "wb", :
downloaded length 0 != reported length 0
2: In download.file(url, file.local, quiet = isFALSE(verbose), mode = "wb", :
cannot open URL 'https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/NUTS_BN_60M_2010_4326_LEVL_0.geojson': HTTP status was '503 Service Unavailable'
That made me wonder if GISCO is using a DDoS protection system as the situation is similar to the situation this Stack Overflow post is describing: https://stackoverflow.com/a/47729625/14413481
@dieghernan Do you get similar errors when running R CMD check on v4-dev
branch or is it just me?
Hi @pitkant , I received today an email from CRAN related with this, also some users are reporting this https://github.com/rOpenGov/giscoR/issues/69 See there
Hi @pitkant
New release of giscoR is now on CRAN, the error should be fixed now. Please allow a couple of days on the action since I think it uses posit package manager instead of CRAN, it may take some time until it updates giscoR to 0.4.0 https://packagemanager.posit.co/client/#/repos/cran/packages/giscoR/611842/overview?search=giscoR%23package-details
I now get the following errors in
get_eurostat_geospatial
:══ Failed tests ════════════════════════════════════════════════════════════════ ── Error ('test_03_get_eurostat_geospatial.R:110:3'): get_eurostat_geospatial nuts levels ── Error in `expect_compare(">", act, exp)`: Result of comparison must be a single logical value Backtrace: ▆ 1. └─testthat::expect_gt(nrow(gall), nrow(gn3)) at test_03_get_eurostat_geospatial.R:110:2 2. └─testthat:::expect_compare(">", act, exp) 3. └─rlang::abort("Result of comparison must be a single logical value") ── Error ('test_03_get_eurostat_geospatial.R:192:3'): get_eurostat_geospatial df ── Error in `expect_compare(">", act, exp)`: Result of comparison must be a single logical value Backtrace: ▆ 1. └─testthat::expect_gt(nrow(gall), nrow(gn3)) at test_03_get_eurostat_geospatial.R:192:2 2. └─testthat:::expect_compare(">", act, exp) 3. └─rlang::abort("Result of comparison must be a single logical value") ── Failure ('test_03_get_eurostat_geospatial.R:318:3'): Check column names POLYGONS from GISCO ── `poly` is not an S3 object ── Failure ('test_03_get_eurostat_geospatial.R:319:3'): Check column names POLYGONS from GISCO ── names(poly) (`actual`) not identical to `col_order` (`expected`). `actual` is NULL `expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...) ── Failure ('test_03_get_eurostat_geospatial.R:327:3'): Check column names POLYGONS from GISCO ── `poly_df` is not an S3 object ── Failure ('test_03_get_eurostat_geospatial.R:328:3'): Check column names POLYGONS from GISCO ── names(poly_df) (`actual`) not identical to col_order[-length(col_order)] (`expected`). `actual` is NULL `expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...) ── Failure ('test_03_get_eurostat_geospatial.R:347:3'): Check column names POLYGONS from GISCO ── `poly` is not an S3 object ── Failure ('test_03_get_eurostat_geospatial.R:348:3'): Check column names POLYGONS from GISCO ── names(poly) (`actual`) not identical to `col_order` (`expected`). `actual` is NULL `expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...) ── Failure ('test_03_get_eurostat_geospatial.R:356:3'): Check column names POLYGONS from GISCO ── `poly_df` is not an S3 object ── Failure ('test_03_get_eurostat_geospatial.R:357:3'): Check column names POLYGONS from GISCO ── names(poly_df) (`actual`) not identical to col_order[-length(col_order)] (`expected`). `actual` is NULL `expected` is a character vector ('id', 'LEVL_CODE', 'NUTS_ID', 'CNTR_CODE', 'NAME_LATN', ...) ── Failure ('test_03_get_eurostat_geospatial.R:543:3'): Check column names BORDERS from GISCO ── `poly` is not an S3 object ── Failure ('test_03_get_eurostat_geospatial.R:552:3'): Check column names BORDERS from GISCO ── `poly_df` is not an S3 object ── Failure ('test_03_get_eurostat_geospatial.R:576:3'): Check column names BORDERS from GISCO ── `poly` is not an S3 object ── Failure ('test_03_get_eurostat_geospatial.R:585:3'): Check column names BORDERS from GISCO ── `poly_df` is not an S3 object [ FAIL 14 | WARN 0 | SKIP 0 | PASS 182 ] Error: Test failures Execution halted 1 error ✖ | 0 warnings ✔ | 0 notes ✔ Error: R CMD check found ERRORs Execution halted
These are related to
gsc_api_cache
function lines 25-26 producingerr_dwnload
object that inheritstry-error
and produces a NULL:err_dwnload <- suppressWarnings(try(download.file(url, file.local, quiet = isFALSE(verbose), mode = "wb"), silent = TRUE))
This is due to HTTP status 503 from GISCO:
Error in download.file(url, file.local, quiet = isFALSE(verbose), mode = "wb", : cannot open URL 'https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/NUTS_BN_60M_2010_4326_LEVL_0.geojson' In addition: Warning messages: 1: In download.file(url, file.local, quiet = isFALSE(verbose), mode = "wb", : downloaded length 0 != reported length 0 2: In download.file(url, file.local, quiet = isFALSE(verbose), mode = "wb", : cannot open URL 'https://gisco-services.ec.europa.eu/distribution/v2/nuts/geojson/NUTS_BN_60M_2010_4326_LEVL_0.geojson': HTTP status was '503 Service Unavailable'
That made me wonder if GISCO is using a DDoS protection system as the situation is similar to the situation this Stack Overflow post is describing: https://stackoverflow.com/a/47729625/14413481
@dieghernan Do you get similar errors when running R CMD check on
v4-dev
branch or is it just me?
DM me on technical infrastructure issues..
Thank you for your message @hannesaddec , I will do so in the future.
@dieghernan I ran the tests with new version of giscoR, everything worked great
Closed with the CRAN release of package version 4.0.0
In current httr2 branch there seems to be a test failure in test_03_get_eurostat_geospatial:
Specifically this seems to be related to this part of the code:
https://github.com/rOpenGov/eurostat/blob/327964ab096d9207b8e669fe74773d8fb11987a6/R/get_eurostat_geospatial.R#L231:L242
The
sf::st_drop_geometry(shp)
was complemented withshp$geometry <- NULL
in PR #264 . I'm not sure if the problem has existed until that moment.Something similar was reported in sf packages GitHub issues but this was supposed to be fixed back in 2019: https://github.com/r-spatial/sf/issues/1057
This occurs only when building and running the check --as-cran from the terminal. In RStudio checking the package --as-cran works fine for some reason.
Any ideas why this is happening? @dieghernan