ropensci / mregions2

Access the Marine Regions Gazetteer and the Marine Regions Data Products in R. Maintained by @salvafern.
https://docs.ropensci.org/mregions2/
Other
5 stars 2 forks source link

Read LDES: include geometries and record info #2

Closed salvafern closed 1 year ago

salvafern commented 2 years ago

To do:

## Using these packages -> add to Imports
library(glue)
library(httr2)
library(rdflib)
library(magrittr)
library(tibble)
library(sf)
library(pbapply)

req_mr_user_agent <- function(.){
  httr2::req_user_agent(. , glue::glue("mregions2 {packageVersion('mregions')}"))
}

# test <- mr_gaz_ldes(26567)
# test <- mr_gaz_ldes(3293, type = "rdf")
# test <- mr_gaz_ldes(1902, type = "rdf")
mr_gaz_ldes <- function(mrgid, type = "list"){
  # config

  req <- httr2::request("http://marineregions.org") %>%
    httr2::req_url_path_append("mrgid") %>%
    httr2::req_url_path_append(mrgid) %>%
    httr2::req_headers(accept = "application/ld+json") %>%
    req_mr_user_agent()

  # Build and perform
  resp <- req %>%
    httr2::req_perform() 

  if(type == "rdf"){
    out <- resp %>% 
      httr2::resp_body_string() %>%
      rdflib::rdf_parse("jsonld")
  }
  if(type == "list"){
    out <- resp %>%
      httr2::resp_body_json() 
  }

  out

  # TODO: add assertions 
}

# test <- mr_gaz_geometry(4280)
# test <- mr_gaz_geometry(3293)
# test <- mr_gaz_geometry(26567)
mr_gaz_geometry <- function(mrgid){
  feed <- mr_gaz_ldes(mrgid, "list")

  has_geometry <- "mr:hasGeometry" %in% names(feed)

  if(has_geometry){

    req_geom <- function(url){
      req <- httr2::request(url)
      req <- req %>% 
        req_mr_user_agent() %>%
        httr2::req_headers(accept = "application/ld+json") %>%
        httr2::req_perform() %>%
        httr2::resp_body_json(encoding = "UTF-8")

      req <- req$`mr:hasGeometry`$`gsp:asWKT`

      # TODO: read CRS on the fly
      req <- gsub("<http://www.opengis.net/def/crs/OGC/1.3/CRS84> ", "", req, fixed = TRUE)
    }

    geom <- pbapply::pblapply(feed$`mr:hasGeometry`, req_geom) %>% 
      unlist() %>% 
      sf::st_as_sfc(crs = 4326) %>%
      sf::st_combine()

    return(geom) 
  }

}

# test <- mr_gaz_record(3293)
# test <- mr_gaz_record(4280)
mr_gaz_record <- function(mrgid, add_geometry = TRUE){
  feed <- mr_gaz_ldes(mrgid)

  req <- httr2::request(feed$`@id`)
  req <- req %>% 
    req_mr_user_agent() %>%
    httr2::req_headers(accept = "application/json") %>%
    httr2::req_perform() %>%
    httr2::resp_body_json(encoding = "UTF-8") %>%
    tibble::as_tibble()

  if(add_geometry){
    req$geom <- mr_gaz_geometry(mrgid)[1, ]
    req <- sf::st_as_sf(req)
  }

  req
}
salvafern commented 2 years ago

Hi @marc-portier , I was wondering if LDES in mr_gaz_ldes() is an appropriate syntax for the function?

This function returns the JSON-LD in marineregions.org/mrgid/{mrgid} parsed as class list (or RDF triples if type = "rdf"). My idea is that, from this function, we can get anything that is in there.

If not, do you have any suggestions? Naming things is hard :)

mr_gaz_ldes(3293, type = "rdf")
# Total of 102 triples, stored in hashes
# -------------------------------
#   <http://marineregions.org/mrgid/3293> <http://marineregions.org/ns/ontology#contains> <http://marineregions.org/mrgid/4675> .
# <http://marineregions.org/mrgid/3293> <http://marineregions.org/ns/ontology#contains> <http://marineregions.org/mrgid/28318> .
# <http://marineregions.org/mrgid/3293> <http://marineregions.org/ns/ontology#contains> <http://marineregions.org/mrgid/27790> .
# <http://marineregions.org/mrgid/3293> <http://marineregions.org/ns/ontology#contains> <http://marineregions.org/mrgid/27555> .
# <http://marineregions.org/mrgid/3293> <http://marineregions.org/ns/ontology#contains> <http://marineregions.org/mrgid/2421> .
# <http://marineregions.org/mrgid/3293> <http://marineregions.org/ns/ontology#contains> <http://marineregions.org/mrgid/2420> .
# <http://marineregions.org/mrgid/3293> <http://marineregions.org/ns/ontology#contains> <http://marineregions.org/mrgid/2419> .
# <http://marineregions.org/mrgid/3293> <http://marineregions.org/ns/ontology#contains> <http://marineregions.org/mrgid/17865> .
# <http://marineregions.org/mrgid/3293> <http://marineregions.org/ns/ontology#contains> <http://marineregions.org/mrgid/17666> .
# <http://marineregions.org/mrgid/3293> <http://marineregions.org/ns/ontology#contains> <http://marineregions.org/mrgid/17409> .
# 
# ... with 92 more triples
LennertSchepers commented 2 years ago

What about mr_gaz_jsonld()? You don't read the event stream (at http://marineregions.org/feed) but only the current gazetteer item in json-ld.

salvafern commented 2 years ago

But it just reads from json-ld, doesn't return json-ld but a list or RDF classes

marc-portier commented 2 years ago

up front: I would personally avoid technical term references, and opt for functional ones (provided we are returning functionally rich response objects, ridden of technical details) --> mr_gaz_changes(since_date = lastupdate)

but the comment from @LennertSchepers is more prominent: you are not reading the /feed here, but one entry in the mr gazetteer --> I think what you are doing would functionally match e.g. mr_gaz(id=3293)

but as said, we could argue if that should respond with rdf / triples (which is a more native / deep down structure)
so what about having layers:

  1. mr_gaz_triples(id=3293) --> does the http with conneg (using turtle or jsonld does not matter - possibly driven by an extra parameter if any would happen to have a preference), whatever is returned you load into a (rdflib?) graph-model
  2. mr_gaz(id=3293)--> this one uses the latter, but then wraps the returned graph into an own custom R (OO) MRAccessor class that makes useful content "accessible" through fields or dedicated methods so one ends up with:
# my R syntax knowledge is (close to) nill so read this as pseudo-code
mrX <-  mr_gaz(3293)   # uses mr_gaz_triples() in the back wrapping the result in an MRAccessor
mrX.name               # returns the skos:prefLabel -- optionally add negotiation of language, default to 'en'  (you will have to make it a method to pass the param though?)
mrX.type               # returns the rdf:type
mrX.contains           # uses the accessor to fetch a list of all MRGID listed in the mr:contains which recursevely... etc etc(should be lazy loading though) 
...

so the accessor object hides away the trurtle/rdf details and pushes the API onto a purely functional level

on the functional level you then design a usage contract that talks about the content and the prupose, not about the techincal means

wdyt?