lvaudor / glitter

an R package which writes SPARQL queries
https://lvaudor.github.io/glitter
45 stars 5 forks source link

a function to get labels based on ID #201

Open lvaudor opened 1 year ago

lvaudor commented 1 year ago

Hi,

For sequins I have been working on a get_label() function:

#' This function takes a component of a triple pattern as input and returns (if it exists) a corresponding human-readable label.
#' @param string the string (a part of a triple pattern) to label
#' @param language the language in which to return the label (defaults to "en")
#' @param endpoint the SPARQL endpoint that is being queried (defaults to "wikidata")
#' @param label_property the name of the labelling property, for instance "skos:prefLabel". Defaults to "rdfs:label". If the endpoint is one of the usual glitter endpoints (see glitter::usual_endpoints) the labelling property is set accordingly.
#' @return the label corresponding to the string
#' @export
get_label=function(string, language="en",endpoint="wikidata", label_property="rdfs:label"){
  if(endpoint %in% glitter::usual_endpoints$name){
    index_endpoint=which(glitter::usual_endpoints$name==endpoint)
    label_property=glitter::usual_endpoints$label_property[index_endpoint]
  }
  if(!glitter:::is_prefixed(string)){
    return(string)
  }
  string=glitter:::str_replace(string,
                               "(^wdt\\:)|(^p\\:)|(^ps\\:)|(^pq\\:)",
                               "wd:")
  result=glitter::spq_init(endpoint=endpoint) %>% 
    glitter::spq_add(glue::glue("{string} {label_property} ?string_label")) %>% 
    glitter::spq_mutate(languages=lang(string_label)) %>% 
    glitter::spq_perform() %>% 
    dplyr::filter(languages==language) %>% # because I don't know how to make glitter::spq_filter work here
    .$string_label
  if(length(result)==0){return(string)}
  return(result)
}

It's supposed to work on all endpoints but I'll admit that right now my only examples which make much sense are on Wikidata...

Examples:

get_label("wd:Q152088",language="en") # returns "French fries"
get_label("wd:Q152088",language="fr") # returns "frite"
get_label("wdt:P31", language="fr") #returns "nature de l'élément"
get_label("'David Bowie'") # returns "'David Bowie'")
get_label("?item") # returns "?item"
get_label("hal:structure",endpoint="hal") # returns 'hal:structure' 

I'm wondering whether it should be included in glitter rather than sequins? What do you think?

maelle commented 1 year ago

It's supposed to work on all endpoints but I'll admit that right now my only examples which make much sense are on Wikidata...

Because other endpoints have readable properties?

lvaudor commented 1 year ago

Well, it would make sense if they did but I think they generally don't :-(. Maybe dbpedia could gather data about owl vocabularies? haven't had the time to check it though

lvaudor commented 1 year ago

In that sense (if it's only relevant for Wikidata) it's similar to some functions you just removed BUT on the other hand not that much because at least it's not based on external packages

maelle commented 1 year ago

could it live in a third package?