ropensci / cde

Repo for R package to query and download WFD data from the EA Catchment Data Explorer site
https://docs.ropensci.org/cde
5 stars 3 forks source link

add wb_sf function #26

Open natesheehan opened 3 years ago

natesheehan commented 3 years ago

Hey - great work on the package, I am just wondering what the current state of development is?

I have been using the package in order to fetch waterbody classification data, however, I had to write my own function to get the geojson/sf for waterbodies/oc/mc/rbd's. I would be happy to integrate this feature into the package, or happy to produce my own package if this package has become stagnant.

Here is an example of how I fetch sf features for any waterbody in any classification group (still needs some refactoring).

```r #### #### #### AIM: fetch waterbody geojson for different waterbody classication groups and transform them into sf objects #### #### library(sf) library(dplyr) #### GET WBID DATA FROM CDE PACKAGE url = "https://raw.githubusercontent.com/ropensci/cde/master/data-raw/ea_wbids.csv" download.file(url, "data/ea_wbids.csv") #### READ DATA wbids = read.csv("data/ea_wbids.csv") #### FUNCTION TO GET SF FEATURES OF WATERBODY get_wb_sf = function(string, #### STRING = NAME OF CLASSFICATION AREA column) #### COLUMN = CLASSIFICATION TYPE E.G. OC | MC | RBD { #### LOGICAL OPERATOR FOR RIVER MINE if (column == "OC") {wb = wbids %>% filter(OC == string)} # OPERATIONAL CATCHMENT if (column == "MC") {wb = wbids %>% filter(MC == string)} # MANAGMENT CATCHMENT if (column == "RBD") {wb = wbids %>% filter(RBD == string)} # RIVER BASIN DISTRICT #### SET EMPTY DF TO MERGE INTO nrows = 1 wb_sf = st_sf(id = 1:nrows, geometry = st_sfc(lapply(1:nrows, function(x) st_geometrycollection()))) st_crs(wb_sf) = 4326 wb_sf$name = "" wb_sf$id = as.character(wb_sf$id) wb_sf = wb_sf %>% filter(name == "nun") #### LOOP THROUGH GEOJSON DOWNLOAD suppressWarnings( for (i in 1:nrow(wb)) { ##### EA CATCHMNET API CALL url = "https://environment.data.gov.uk/catchment-planning/WaterBody/" notation = wb$WBID[i] download_url = paste0(url, notation, ".geojson") #### SET OUTPUT PATH river_wbid = wb$WBID[i] path = "data/river_sf/" river_output = paste0(path, river_wbid, ".geojson") #### DOWNLOOAD FILE, AT LEAST TRY TO tryCatch( expr = { download.file(download_url, river_output) }, error = function(e) { message("Unable to download River Please check column and string are correct") } ) river_sf = read_sf(river_output) %>% select(id, name) wb$geometry[i] = river_sf$geometry[1] #### REMOPVE DOWNLOADED FILE file.remove(river_output) } ) return(wb) } #### testing 12 thames_sf = get_wb_sf(string = "Thames", column = "RBD") write_sf(thames_sf, "data/thames_river.geojson") thames_wb = read_sf("data/thames_river.geojson") \```

robbriers commented 3 years ago

Nate, the package is still 'live' but I've got a bit of work to do on it planned (need to fix issues with the data retrieval which caused it to be kicked off CRAN - going to use httr probably). Really like what you have done with the function to access waterbody geojson and would be interested in getting this integrated into the functionality. Not going to be able to get to working on this for at at a while because of other commitments, but it's heading up the todo list!

natesheehan commented 3 years ago

@robbriers I think the EA has changed their API recently, which may be why issues are being flagged by CRAN. I would be happy to add a PR with the waterbody functionality added in if you have any time to review PR's?

robbriers commented 3 years ago

@natesheehan Happy to get a PR with the functionality added - sorry for the delay in getting back to you.

robbriers commented 3 years ago

I am trying to minimise package dependencies, so if it is possible to do this without importing dplyr then that would be good - sf is going to be necessary regardless

natesheehan commented 3 years ago

I am trying to minimise package dependencies, so if it is possible to do this without importing dplyr then that would be good - sf is going to be necessary regardless

Agreed, I will convert the dplyr usage to base-R and then create a PR!

natesheehan commented 2 years ago

apologies, I've been uber busy the past few weeks - I hope to work on this issue this week

robbriers commented 2 years ago

No worries - no great rush. I've been working on updating the API calls to get the whole thing working again, but still got some work to do on this as well

natesheehan commented 2 years ago

@robbriers Great stuff - gonna throw this code here if its helpful, I needed to use the new API to get waterbody ecological classification... once again, this does use tidyverse so will need further refactoring.

maybe helpful code ```{r} #### #### #### AIM: fetch waterbody classificatoin for different waterbodies #### #### #### READ DATA wbids = read.csv("data/ea_wbids.csv") #### FUNCTION TO GET SF FEATURES OF WATERBODY get_wb_classification = function(string, #### STRING = NAME OF CLASSFICATION AREA column) #### COLUMN = CLASSIFICATION TYPE E.G. OC | MC | RBD { #### LOGICAL OPERATOR FOR RIVER MINE if (column == "OC") { wb = wbids %>% filter(OC == string) } # OPERATIONAL CATCHMENT if (column == "MC") { wb = wbids %>% filter(MC == string) } # MANAGMENT CATCHMENT if (column == "RBD") { wb = wbids %>% filter(RBD == string) } # RIVER BASIN DISTRICT #### LOOP THROUGH GEOJSON DOWNLOAD suppressWarnings(for (i in 1:nrow(wb)) { ##### EA CATCHMNET API CALL url = "https://environment.data.gov.uk/catchment-planning/WaterBody/" notation = wb$WBID[i] download_url = paste0(url, notation, "/classifications.csv") #### SET OUTPUT PATH river_wbid = wb$WBID[i] path = "data/river_sf/" river_output = paste0(path, river_wbid, ".csv") #### DOWNLOOAD FILE, AT LEAST TRY TO tryCatch( expr = { download.file(download_url, river_output) }, error = function(e) { message("Unable to download River Please check column and string are correct") } ) river_sf = read.csv(river_output) %>% filter(Classification.Item == "Ecological") %>% filter(Year == 2019) wb$status[i] = river_sf$Status[1] wb$year[i] = river_sf$Year[1] #### REMOPVE DOWNLOADED FILE file.remove(river_output) }) return(wb) } # test function wb_class = get_wb_classification(string = "Thames",column = "RBD") ```
natesheehan commented 2 years ago
####
####
#### AIM: fetch waterbody geojson for different waterbody classication groups and transform them into sf objects
####
####

#### GET WBID DATA FROM CDE PACKAGE
url = "https://raw.githubusercontent.com/ropensci/cde/master/data-raw/ea_wbids.csv"
download.file(url, "data/ea_wbids.csv")

#### READ DATA
wbids = read.csv("data/ea_wbids.csv")

#### FUNCTION PURPOSE: FETCH WATERRBODY GEOJSON FROM EA CATCHMENT API
#### FUNCTION RETURNS A SF OBJECT WITH THE FOLLOWING COLUMNS:
##  "WBID"     "name"     "type"     "OC"       "OC_num"   "MC"       "MC_num"   "RBD"      "RBD_num"  "geometry"
## DEPENDING ON THE AREA SPECIFIED, THE FUNCTION WILL REUTRN MBETWEEN 1-x WATERBODIES
get_wb_sf = function(string, #### STRING = NAME OF CLASSFICATION AREA E.G. RIVER TILL
                     column) #### COLUMN  = CLASSIFICATION TYPE E.G. OC | MC | RBD
  {
  #### LOGICAL OPERATOR FOR RIVER MINE
  if (column == "OC") {wb = wbids %>% subset(OC == string)} # OPERATIONAL CATCHMENT
  if (column == "MC") {wb = wbids %>% subset(MC == string)} # MANAGMENT CATCHMENT
  if (column == "RBD") {wb = wbids %>% subset(RBD == string)} # RIVER BASIN DISTRICT
  if(column != "OC" & column != "MC" & column != "RBD"){
    message("Woops, looks like you declared an invalid column type. Please try E.G. OC | MC | RBD")
  } else{
    message("Running function:")
    #### SET EMPTY DF TO MERGE INTO
    nrows = 1
    wb_sf = st_sf(id = 1:nrows, geometry = st_sfc(lapply(1:nrows, function(x)
      st_geometrycollection())))
    st_crs(wb_sf) = 4326 #### SET CRS TO MATCH THAT OF THE EA
    wb_sf$name = ""
    wb_sf$id = as.character(wb_sf$id)
    wb_sf = wb_sf %>% filter(name == "nun") #### CLEAR ANY VALUES IN DF

    #### LOOP THROUGH GEOJSON DOWNLOAD
    suppressWarnings(
      for (i in 1:nrow(wb)) {
        ##### EA CATCHMNET API CALL
        url = "https://environment.data.gov.uk/catchment-planning/WaterBody/"
        notation = wb$WBID[i]
        download_url = paste0(url, notation, ".geojson")

        #### SET OUTPUT PATH
        river_wbid = wb$WBID[i]
        path = "data/river_sf/"
        river_output = paste0(path, river_wbid, ".geojson")

        #### DOWNLOOAD FILE, AT LEAST TRY TO
        tryCatch(
          expr = {
            download.file(download_url, river_output)
          },
          error = function(e) {
            message("Unable to download River Please check column and string are correct")
          }
        )

        river_sf = read_sf(river_output)
        river_sf = river_sf[,"id","name"]

        wb$geometry[i] = river_sf$geometry[1]

        #### REMOPVE DOWNLOADED FILE
        file.remove(river_output)
      }
    )
    return(wb)
  }
}

#### testing 12
thames_sf = get_wb_sf(string = "Petteril", column = "OC")
# write_sf(thames_sf, "data/thames_river.geojson")

The above code works without any new dependencies apart from the sf package - I have two checks in there to make sure they spell the correct column or string. Are you happy for me to create a PR?

robbriers commented 2 years ago

Sure - looks great. might have to modify a bit to integrate with the rest of the data checking routines etc., but happy to get it fired in and see how it goes