lvaudor / glitter

an R package which writes SPARQL queries
https://lvaudor.github.io/glitter
44 stars 5 forks source link

"Conjugation" of Wikidata properties #199

Open lvaudor opened 1 year ago

lvaudor commented 1 year ago

Hi,

I'm not sure it's an "Issue" but I would like to have your input on that @maelle.

I'm working on package sequins and doing so I'm trying to get labels of properties. I have created a function get_label (different from what we had formerly implemented in glitter because I'm trying to use glitter itself to do so and not WikidataR).

get_label=function(string, language="en",endpoint="Wikidata", labelling_prop="rdfs:label"){
  if(!glitter:::is_prefixed(string)){
    return(string)
  }
  result=spq_init(endpoint=endpoint) %>% 
    spq_add(glue::glue("{string} {labelling_prop} ?string_label")) %>% 
    spq_mutate(languages=lang(string_label)) %>% 
    spq_perform() %>% 
    dplyr::filter(languages==language) %>% 
    .$string_label
  return(result)
}

I want to pick properties names directly from the triplet patterns of the glitter query so that I will have for instance "wdt:P31" to label. The thing is, "wdt:P31" does not have a label. "wd:P31" has. This has made me fully realize that Wikidata have this unique (I think?) feature of (kind of) conguging its properties based on their location or role in the triplet patterns. For instance: wd or wdt whether it's used as a subject/object or a verb, p, ps, pq for property qualifiers.

Would you agree with that way of seeing things? Have you encountered this kind of "conjugation" in another SPARQL endpoint?

On my way to replace "wdt:", "p:","ps:","pq:" with "wd:" in the get_label() function above, but I'd love to hear your thoughts on this ;-)

maelle commented 1 year ago

wow, more grammar!

yes I think it makes sense that you're getting to the root of the property. It has a name in language processing: https://en.wikipedia.org/wiki/Stemming so you could call the internal function doing this replacement stem_property.