Closed lvaudor closed 1 year ago
another issue is that the spq_group_by()
modifies the select.
I also want to add an option to get a log at each step.
@lvaudor could you please write what SPARQL query you'd have expected to be the result of the pipeline?
the behavior of spq_group_by()
probably needs to be modified. in it we somehow either keep all variables but as a * (so not named explicitly) or only those variables that are used for the grouping. instead, it might make sense to add the variable used for the grouping?
in the example, loc isn't defined.
I see that filmLabel isn't selected by default, it probably should.
Indeed my example was incomplete. So, here is an example that works as long as I don't uncomment any of the three lines (2nd spq_mutate, or spq_group_by+summarize).
tib=spq_init() %>%
spq_add("?film wdt:P31 wd:Q11424",
.label="?film") %>%
spq_add("?film wdt:P840 ?loc",
.label="?loc") %>%
spq_add("?film wdt:P577 ?date") %>%
spq_mutate(year=year(date)) %>%
#spq_mutate(yearmin=min(year)) %>%
#spq_group_by(loc, yearmin) %>%
#spq_summarize(n_films=n()) %>%
spq_head(10) %>%
spq_perform()
Hmph I'm trying to find the SPARQL equivalent to this and realizing I don't know how to do it :-/
library("glitter")
spq_init() %>%
spq_add("?film wdt:P31 wd:Q11424") %>%
spq_head(10) %>%
spq_add("?film wdt:P577 ?date") %>%
spq_mutate(year = year(date)) %>%
spq_group_by(filmLabel,loc) %>%
spq_summarise(year = min(year)) %>%
spq_perform()
#> # A tibble: 1 × 1
#> year
#> <dbl>
#> 1 202
Created on 2023-07-28 with reprex v2.0.2
library("glitter")
spq_init() %>%
spq_add("?film wdt:P31 wd:Q11424") %>%
spq_add("?film wdt:P840 ?loc") %>%
spq_add("?film wdt:P577 ?date") %>%
spq_mutate(year=year(date)) %>%
spq_mutate(yearmin = min(year)) %>%
spq_group_by(loc, yearmin) %>%
spq_summarize(n_films = n()) %>%
spq_head(10) %>%
spq_perform()
#> # A tibble: 10 × 3
#> loc yearmin n_films
#> <chr> <dbl> <dbl>
#> 1 http://www.wikidata.org/entity/Q15 1932 25
#> 2 http://www.wikidata.org/entity/Q1 1932 4
#> 3 http://www.wikidata.org/entity/Q2 1932 10
#> 4 http://www.wikidata.org/entity/Q15 1930 23
#> 5 http://www.wikidata.org/entity/Q15 1915 514
#> 6 http://www.wikidata.org/entity/Q18 1915 128
#> 7 http://www.wikidata.org/entity/Q17 1915 187
#> 8 http://www.wikidata.org/entity/Q16 1915 204
#> 9 http://www.wikidata.org/entity/Q21 1915 14
#> 10 http://www.wikidata.org/entity/Q21 1913 111
Created on 2023-07-28 with reprex v2.0.2
and also
library("glitter")
spq_init() %>%
spq_add("?film wdt:P31 wd:Q11424") %>%
spq_add("?film wdt:P840 ?loc") %>%
spq_add("?film wdt:P577 ?date") %>%
spq_mutate(year=year(date)) %>%
spq_mutate(year = min(year)) %>%
spq_group_by(loc, year) %>%
spq_summarize(n_films = n()) %>%
spq_head(10) %>%
spq_perform()
#> # A tibble: 10 × 3
#> loc year n_films
#> <chr> <dbl> <dbl>
#> 1 http://www.wikidata.org/entity/Q15 1932 25
#> 2 http://www.wikidata.org/entity/Q1 1932 4
#> 3 http://www.wikidata.org/entity/Q2 1932 10
#> 4 http://www.wikidata.org/entity/Q15 1916 307
#> 5 http://www.wikidata.org/entity/Q15 1915 192
#> 6 http://www.wikidata.org/entity/Q17 1915 35
#> 7 http://www.wikidata.org/entity/Q16 1912 319
#> 8 http://www.wikidata.org/entity/Q15 1912 38
#> 9 http://www.wikidata.org/entity/Q21 1912 312
#> 10 http://www.wikidata.org/entity/Q17 1912 152
Created on 2023-07-28 with reprex v2.0.2
For now we can not combine several spq_mutate/spq_summarise (involving a common variable). For instance:
does not succeed. Indeed, we'd need to produce this SPARQL code
but right now we produce
MIN(?year) AS ?year
because the spq_summarise (year=min(year)
) just overwrites the first spq_mutate (year=year(date)
) so that ?year is finally undefined in the SPARQL code.