Closed dieghernan closed 1 year ago
Excellent work, thank you very much! I tested this version and everything seems to be working smoothly.
I was left thinking about issue #240 since now there's a difference between datasets read from the cache and datasets downloaded from GISCO. Like this:
Downloaded from GISCO by using resolution "20":
> sf <- get_eurostat_geospatial(
+ output_class = "sf",
+ resolution = "20",
+ nuts_level = "all"
+ )
Extracting data using giscoR package, please report issues on https://github.com/rOpenGov/giscoR/issues
Cache management as per giscoR. see 'giscoR::gisco_get_nuts()'
> sf
Simple feature collection with 2016 features and 10 fields
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: -63.08825 ymin: -21.39077 xmax: 55.83808 ymax: 71.11814
Geodetic CRS: WGS 84
First 10 features:
NAME_LATN MOUNT_TYPE CNTR_CODE URBN_TYPE COAST_TYPE FID NUTS_NAME NUTS_ID LEVL_CODE geometry geo
1 ESPAÑA 0 ES 0 0 ES ESPAÑA ES 0 MULTIPOLYGON (((-7.03184 43... ES
2 SUOMI / FINLAND 0 FI 0 0 FI SUOMI / FINLAND FI 0 MULTIPOLYGON (((28.92968 69... FI
3 ÍSLAND 0 IS 0 0 IS ÍSLAND IS 0 POLYGON ((-22.0074 63.83599... IS
4 REGIÃO AUTÓNOMA DOS AÇORES 0 PT 0 0 PT2 REGIÃO AUTÓNOMA DOS AÇORES PT2 1 MULTIPOLYGON (((-25.6919 37... PT2
5 FRANCE 0 FR 0 0 FR FRANCE FR 0 MULTIPOLYGON (((2.60704 50.... FR
6 HRVATSKA 0 HR 0 0 HR HRVATSKA HR 0 MULTIPOLYGON (((16.59681 46... HR
7 MAGYARORSZÁG 0 HU 0 0 HU MAGYARORSZÁG HU 0 POLYGON ((22.12108 48.37831... HU
[...]
Read from cache by using resolution "60":
> sf <- get_eurostat_geospatial(
+ output_class = "sf",
+ resolution = "60",
+ nuts_level = "all"
+ )
Extracting data from eurostat::eurostat_geodata_60_2016
> sf
Simple feature collection with 2016 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -61.841 ymin: -21.376 xmax: 55.85 ymax: 71.178
Geodetic CRS: WGS 84
First 10 features:
LEVL_CODE NUTS_ID URBN_TYPE CNTR_CODE NAME_LATN NUTS_NAME MOUNT_TYPE COAST_TYPE FID geo geometry
1 0 AL 0 AL SHQIPËRIA SHQIPËRIA 0 0 AL AL MULTIPOLYGON (((19.831 42.4...
2 0 AT 0 AT ÖSTERREICH ÖSTERREICH 0 0 AT AT MULTIPOLYGON (((15.754 48.8...
3 0 BE 0 BE BELGIQUE-BELGIË BELGIQUE-BELGIË 0 0 BE BE MULTIPOLYGON (((5.238 51.26...
4 0 BG 0 BG BULGARIA БЪЛГАРИЯ 0 0 BG BG MULTIPOLYGON (((22.675 44.2...
5 0 CH 0 CH SCHWEIZ/SUISSE/SVIZZERA SCHWEIZ/SUISSE/SVIZZERA 0 3 CH CH MULTIPOLYGON (((8.67 47.685...
6 0 CY 0 CY KYPROS ΚΥΠΡΟΣ 0 0 CY CY MULTIPOLYGON (((34.633 35.8...
[...]
So the difference being mainly in the position of the "geo" column. I think having "geometry" column as the last column of the "sf" "data.frame" object would be desirable. Another question is whether it's actually needed, but then again someone needed it in the first place since it was added there. Does @muuankarski have any input on this?
I wasn’t aware of #240, let me have a look
Hi @pitkant
After https://github.com/rOpenGov/eurostat/pull/264/commits/3a6e48fdeb8ee8540143191e40b987e09ad8ff15 the output of the spatial objects are consistent in the number, name and position of the columns. If I check for all years:
years <- c('2003','2006','2010','2013','2016','2021')
allyears <- lapply(years, function(x){
tb <- suppressMessages(eurostat::get_eurostat_geospatial(country = "LU",
nuts_level = 0, resolution = 20,
update_cache = TRUE))
# Prepare summary
df <- data.frame(column = names(tb),
ncol = seq_len(ncol(tb)))
df <- tidyr::pivot_wider(df, names_from = column, values_from = ncol)
df$year <- x
df
})
dplyr::bind_rows(allyears) |>
dplyr::relocate(year, .before = 1) |>
knitr::kable(caption = "Number indicates the position of the column in the object")
year | id | LEVL_CODE | NUTS_ID | CNTR_CODE | NAME_LATN | NUTS_NAME | MOUNT_TYPE | URBN_TYPE | COAST_TYPE | FID | geo | geometry |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
2006 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
2010 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
2013 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
2016 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
2021 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
Number indicates the position of the column in the object
Created on 2023-07-31 with reprex v2.0.2
All the objects have the same columns in the same position. Note that, as mentioned in #240, for some years (i.e., from 2003 to 2013) there are columns that are not provided (i.e., MOUNT_TYPE
and friends). In those cases, our output would return that columns but with NA
, so it would be more consistent for users:
eurostat::get_eurostat_geospatial(year = 2010) |>
dplyr::glimpse()
#> Extracting data using giscoR package, please report issues on https://github.com/rOpenGov/giscoR/issues
#> Rows: 1,920
#> Columns: 12
#> $ id <chr> "AT111", "AT112", "AT113", "AT121", "AT122", "AT123", "AT12…
#> $ LEVL_CODE <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
#> $ NUTS_ID <chr> "AT111", "AT112", "AT113", "AT121", "AT122", "AT123", "AT12…
#> $ CNTR_CODE <chr> "AT", "AT", "AT", "AT", "AT", "AT", "AT", "AT", "AT", "AT",…
#> $ NAME_LATN <chr> "Mittelburgenland", "Nordburgenland", "Südburgenland", "Mos…
#> $ NUTS_NAME <chr> "Mittelburgenland", "Nordburgenland", "Südburgenland", "Mos…
#> $ MOUNT_TYPE <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ URBN_TYPE <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ COAST_TYPE <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ FID <chr> "AT111", "AT112", "AT113", "AT121", "AT122", "AT123", "AT12…
#> $ geo <chr> "AT111", "AT112", "AT113", "AT121", "AT122", "AT123", "AT12…
#> $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((16.648 47.4..., MULTIPOLYGON (…
Created on 2023-07-31 with reprex v2.0.2
Thank you for the fixes! The output looks nice and consistent now. I'm still not sure about adding duplicated columns in the dataset for supposedly easier joins, especially if geo and FID are both duplicates, but I guess it doesn't add too much overhead either. I added a note about geo column having the "Questioning" status
Thanks @pitkant , I consider this PR ready now.
Hi @antagomir @pitkant
This PR is related with #230 and #263
Basically now
get_eurostat_geospatial
is a wrapper ofgiscoR::gisco_get_nuts
. I skimmed the function as much as I can so issues on that function would be addressed now in giscoR, reducing the burden of maintenance by eurostat package.I also reviewed the dependencies removing a bunch of them, updated the vignettes and the actions and deleted unused files in
docs/vignettes/revdep
.I checked the package and revdeps and all it’s fine. Summary of changes:
get_eurostat_geospatial()
now leverages ongiscoR::gisco_get_nuts()
for downloading geospatial data:"spdf"
output class soft-deprecated, it would return asf
object with a message.make_valid
parameter soft-deprecated....
to the function so additional parametes can be passed togiscoR::gisco_get_nuts()
.eurostat_geodata_60_2016
updated.