Closed maelle closed 1 year ago
@lvaudor do you have any idea/wish?
Maybe
spq_add(triplet = "?city wdt:P1082 ?pop", required=FALSE)
spq_add(city, "wdt:P1082", pop, required=FALSE)
(quotes still needed for the things with ":", unless we go for something else e.g. making them wdt::P1082
.Originally, glitter (or, recitR actually) worked with the 'subject', 'verb', 'object' arguments, (there was no 'triplet' argument) but I like the triplet argument better (less quotes, simpler to read and understand in the LOD queries conceptual framework because they're just, basically, sentences). I tried to make sure that there were no other arguments starting with t (same for subject, verb, object too I think?) so that shortening the triplet argument with just 't=' is alright.
I'm not sure spq_add() can be modified in the same way you did for the other functions, in particular I have no idea how we could "unstring" it (and see, I was not expecting you to find a way hehe). Of course, if you see other ways to improve this function I'm all ears :-)
Ok, let's keep it as is for now at least. Thank you!
One thing that might be simplified is the label argument :thinking:
yes! and maybe we could have a label argument equivalent for queries to endpoints other than Wikidata? Typically that would mean that something like
add_triplet("?s v ?o", label=c("s"))
would add an implicit triplet
"?s rdfs:label ?sLabel"
... because I think adding these triplets manually is a bit cumbersome
Another worry I have is the difference between spq_add()
and spq_filter()
i.e. how to know where to put each part of a query, what's the best strategy :thinking:
I guess that if you can express something in terms of properties you should use spq_add()
, and then spq_filter()
is for value comparisons etc.
Regarding label it might be good to be able to add all of them by default. I.e. a parameter label=TRUE.
A thing that irks me with a line like spq_add("?stations wdt:P31 wd:Q928830")
is that it's not human-readable.
I wonder whether
spq_register("wdt:P31", as = `is_instance_of()`, service = "Wikidata")
We'd store the definition in an environment. We'd output a message with the label (so users could see "registered wdt:P31 (label "instance of") as glitter function is_instance_of()
).
Then later
stations_metro_Lyon=spq_init() %>%
spq_add("?stations wdt:P361 wd:Q1552", label="?stations") %>%
spq_add(is_instance_of(station, stations = "wd:Q928830")) %>%
spq_perform()
To me spq_add("?auteur foaf:birthday ?jour")
reads as spq_mutate(jour = foaf::birthday(auteur))
:thinking:
Or spq_add(jour = foaf_birthday(auteur))
Current
tib=spq_init() %>%
spq_add("?auteur foaf:birthday ?jour") %>%
spq_add("?auteur bio:birth ?date1") %>%
spq_add("?auteur bio:death ?date2") %>%
spq_add("?auteur foaf:name ?nom", required=FALSE) %>%
spq_arrange(jour) %>%
spq_prefix() %>%
spq_head(n=10) %>%
spq_perform(endpoint="dataBNF")
Maybe nicer (note the endpoint comes earlier as it's central):
tib=spq_init(auteur, endpoint = "dataBNF") %>%
spq_add(jour = foaf_birthday(auteur)) %>%
spq_add(date1 = bio_birth(auteur)) %>%
spq_add(date1 = bio_death(auteur)) %>%
spq_add(nom = foaf_name(auteur), required = FALSE) %>%
spq_arrange(jour) %>%
spq_prefix() %>%
spq_head(n=10) %>%
spq_perform()
For spq_add("{fleurs_du_mal} foaf:focus ?Oeuvre")
I think spq_filter(foaf_focus(Oeuvre) == fleurs_du_mal)
I keep coming back to my spq_register()
idea to register synonyms. I really think it could help in some cases, but shouldn't be compulsory.
Current thoughts
spq_add("?auteur bio:birth ?date1")
the focus is on date1, whereas spq_add("?mayor wdt:P31 ?species")
the focus is on the mayor. The first one is more a mutate, the second one more a filter.So spq_add("?mayor wdt:P31 ?species")
would be spq_filter(mayor = wdt::P31(species))
whereas spq_add("?auteur bio:birth ?date1")
would be spq_add(date1 = bio::birth(auteur))
We can keep a function adding SPARQL filters directly with spq()
.
Instead of having a special rule for is / %in% I'd like to define two functions
spq_set()
to be used on queries, e.g. spq_set(species = c('wd:144','wd:146', 'wd:780'), mayor = "Q30185")
to get the VALUES.spq_register()
to register the helper values for all queries in a session. (maybe with something clever to not put them in a query if the value isn't used).The messages are still something I'd like to add.
I'm still not sure we want to have such different behavior for spq_filter()
and spq_add()
. FILTER is for filtering the data so it's different.
spq_triple()
to add an actual triple (like spq_add()
now)spq_add()
to add a thing to the results like birth datespq_specify()
to add a filter to the result like "item is an instance of cat or dog".
:thinking: I guess that if you can express something in terms of properties you should use
spq_add()
, and thenspq_filter()
is for value comparisons etc.
Indeed. I guess that renaming spq_add() into spq_pattern() (following what I told you about triplets and triplet patterns) could make this difference clearer?
Regarding label it might be good to be able to add all of them by default. I.e. a parameter label=TRUE.
Yes and no, because not all unknowns can be labelled (for instance, a date or image link can't) and for now the "strain" of knowing whether asking for a label makes sense is on the user himself (who knows better -?- than asking for label for a date for instance). You could probably add the "labelling triplet pattern" with the required=FALSE option (so that you would get empty columns for some labels but not remove non-labelled individuals altogether) but that would return rather large tables with many void columns which I think is not ideal.
spq_triple()
to add an actual triple (likespq_add()
now)
spq_add()
to add a thing to the results like birth date
spq_specify()
to add a filter to the result like "item is an instance of cat or dog". thinking
OK, so for now I'd say:
rename spq_add() into spq_pattern() (or spq_tp for triplet pattern?) BUT I'd rather you keep the main argument as a triplet pattern "s v o". So no need in my opinion for a function spq_specify().
modify spq_filter(), spq_mutate(), etc. to allow for (primarily) an R-like syntax truc==f(bidule) or o=v(s) BUT still allow for triplet patterns if explicitly specified argument t="s v o")
maybe try and tweak spq_filter() so that the user can add a triple pattern like "item is an instance of cat or dog" as an R-like syntax, for instance spq_filter(item=="wd:xxcat") or spq_filter(item %in% c("wd:xxcat","wd:xxdog"))
Regarding spq_mutate()
, spq_add("?auteur bio:birth ?date1")
would then be spq_mutate(date1 = bio::birth(auteur))
?
In which case the way we'd recognize it's not a mutate resulting to "blabla AS truc" is the presence of ::
.
Just a note that by adding new behaviors to spq_mutate()
and spq_filter()
we're hiding some concepts from the users but it might be fine.
Also noting that for all functions using ...
I'll add a dot in front of the other arguments for avoiding name clashes. E.g. ".triple".
Regarding
spq_mutate()
,spq_add("?auteur bio:birth ?date1")
would then bespq_mutate(date1 = bio::birth(auteur))
?In which case the way we'd recognize it's not a mutate resulting to "blabla AS truc" is the presence of
::
.
In the same way that a R-user can consider that "?thing is an instance of wd:xxxx" is a kind of filter (and hence might be tempted to pass it through a call to spq_filter) he/she can consider that "?thing has property ?stuff" is a kind of mutate since it adds a variable. I must reckon that I hadn't thought of allowing for the syntax spq_mutate(date1 = bio::birth(auteur))
I thought: either spq_mutate(triplet) and then it's a disguised called to spq_add()
or something like
spq_mutate(stuff=n(thing)) (with the SPARQL keywords translated as R functions)
Because right now you have not implemented these "bio::birth"-like functions right?
Because right now you have not implemented these "bio::birth"-like functions right?
I have started, actually, and it's not hard to support. Example https://github.com/lvaudor/glitter/pull/81/files#diff-762db8c96d7eced05483d186e208c0af7707b637e8a350af34d3165632fb7257R21
So I'd extend that to other functions + add a .triple
argument to the functions. Does that sound good?
Because right now you have not implemented these "bio::birth"-like functions right?
I have started, actually, and it's not hard to support. Example https://github.com/lvaudor/glitter/pull/81/files#diff-762db8c96d7eced05483d186e208c0af7707b637e8a350af34d3165632fb7257R21
So I'd extend that to other functions + add a
.triple
argument to the functions. Does that sound good?
Great, I thought that might be a bit of a hassle. So, yes, sounds good!
In the PR a TODO would be to make spq_add() simple again.
just one thought:
Another argument against "forcing" all "s v o" triplet patterns into o=v(s) arguments is that sometimes you have "s ?v o" (which properties link subject and object) or "?s v o" (which subject is such that s v o) so how would you translate this into R logic?
(I'm just trying to justify my reluctance to drop triplet patterns entirely ;-) )
"s ?v o" (which properties link subject and object) or "?s v o" (which subject is such that s v o) so how would you translate this into R logic?
Oh yes we definitely need a way to keep them. Now for the sake of completion could you please give me two examples of those?
I think there can be quite a lot of examples of "?s v o" . One (well, two) in the Wikidata vignette:
stations_metro_Lyon=spq_init() %>% spq_add("?stations wdt:P361 wd:Q1552", label="?stations") %>% spq_add("?stations wdt:P31 wd:Q928830") %>% spq_add("?stations wdt:P625 ?coords") %>% spq_perform()
As for "s ?v o" I think we have no example for now but you'd get them when you're trying to explore a bit the contents of a database and which properties are available for an item, like in the following (SPARQL) query on DBpedia which looks for all the properties "to or from" the Apple company:
select distinct ?prop where { {?apple a http://dbpedia.org/ontology/Company . ?apple rdfs:label ?name. filter(regex(?name, "Apple Inc"))}.
{{?x ?prop ?apple} union {?apple ?prop ?y}}}
From this issue, we need to keep the idea to add messages for Wikidata query building (maybe for a Wikidata specific R package).
and the label argument could still use some simplifications eventually.
Yes, that's one of the things I can envision the most clearly (and excitedly) for future contracts actually ;-)
From this issue, we need to keep the idea to add messages for Wikidata query building (maybe for a Wikidata specific R package).
What would this be? Would this be in a separate package? Or should we drop this?
Ouch this issue and following conversation was very long indeed (and many aspects of it have been settled now: labelling is now greatly improved, instances of ?v are in your "exploring new SPARQL endpoints" vignette, etc.). The remaining considerations (should we build a Wikidata-specific package to handle Wikidata-tailored query-building messages) imply that we have the time to do so and we don't... So I think we can close it!
55 handles
spq_select()
,spq_mutate()
,spq_summarise()
but I'm not sure how to handlespq_add()
yet.