lvaudor / glitter

an R package which writes SPARQL queries
https://lvaudor.github.io/glitter
44 stars 5 forks source link

combine several spq_mutate/spq_summarise #95

Closed lvaudor closed 1 year ago

lvaudor commented 1 year ago

For now we can not combine several spq_mutate/spq_summarise (involving a common variable). For instance:

spq_init() %>%
  spq_add("?film wdt:P31 wd:Q11424",.label="film") %>%
  spq_head(10) %>%
  spq_add("?film wdt:P577 ?date") %>%
  spq_mutate(year=year(date)) %>%
  spq_group_by(filmLabel,loc) %>% 
  spq_summarise(year=min(year)) %>% 
  spq_perform()

does not succeed. Indeed, we'd need to produce this SPARQL code

MIN(YEAR(?date))  AS ?year

but right now we produce MIN(?year) AS ?year because the spq_summarise (year=min(year)) just overwrites the first spq_mutate (year=year(date)) so that ?year is finally undefined in the SPARQL code.

maelle commented 1 year ago

another issue is that the spq_group_by() modifies the select.

I also want to add an option to get a log at each step.

maelle commented 1 year ago

@lvaudor could you please write what SPARQL query you'd have expected to be the result of the pipeline?

maelle commented 1 year ago

the behavior of spq_group_by() probably needs to be modified. in it we somehow either keep all variables but as a * (so not named explicitly) or only those variables that are used for the grouping. instead, it might make sense to add the variable used for the grouping?

maelle commented 1 year ago

in the example, loc isn't defined.

I see that filmLabel isn't selected by default, it probably should.

lvaudor commented 1 year ago

Indeed my example was incomplete. So, here is an example that works as long as I don't uncomment any of the three lines (2nd spq_mutate, or spq_group_by+summarize).

tib=spq_init() %>%                 
  spq_add("?film wdt:P31 wd:Q11424",  
          .label="?film") %>%        
  spq_add("?film wdt:P840 ?loc",      
          .label="?loc") %>%
  spq_add("?film wdt:P577 ?date") %>%  
  spq_mutate(year=year(date)) %>% 
  #spq_mutate(yearmin=min(year)) %>% 
  #spq_group_by(loc, yearmin) %>% 
  #spq_summarize(n_films=n()) %>% 
  spq_head(10) %>% 
  spq_perform()  
lvaudor commented 1 year ago

Hmph I'm trying to find the SPARQL equivalent to this and realizing I don't know how to do it :-/

maelle commented 1 year ago
library("glitter")
spq_init() %>%
  spq_add("?film wdt:P31 wd:Q11424") %>%
  spq_head(10) %>%
  spq_add("?film wdt:P577 ?date") %>%
  spq_mutate(year = year(date)) %>%
  spq_group_by(filmLabel,loc) %>% 
  spq_summarise(year = min(year)) %>% 
  spq_perform()
#> # A tibble: 1 × 1
#>    year
#>   <dbl>
#> 1   202

Created on 2023-07-28 with reprex v2.0.2

maelle commented 1 year ago
library("glitter")
spq_init() %>%                 
  spq_add("?film wdt:P31 wd:Q11424") %>%        
  spq_add("?film wdt:P840 ?loc") %>%
  spq_add("?film wdt:P577 ?date") %>%  
  spq_mutate(year=year(date)) %>% 
  spq_mutate(yearmin = min(year)) %>% 
  spq_group_by(loc, yearmin) %>% 
  spq_summarize(n_films = n()) %>% 
  spq_head(10) %>% 
  spq_perform()  
#> # A tibble: 10 × 3
#>    loc                                yearmin n_films
#>    <chr>                                <dbl>   <dbl>
#>  1 http://www.wikidata.org/entity/Q15    1932      25
#>  2 http://www.wikidata.org/entity/Q1     1932       4
#>  3 http://www.wikidata.org/entity/Q2     1932      10
#>  4 http://www.wikidata.org/entity/Q15    1930      23
#>  5 http://www.wikidata.org/entity/Q15    1915     514
#>  6 http://www.wikidata.org/entity/Q18    1915     128
#>  7 http://www.wikidata.org/entity/Q17    1915     187
#>  8 http://www.wikidata.org/entity/Q16    1915     204
#>  9 http://www.wikidata.org/entity/Q21    1915      14
#> 10 http://www.wikidata.org/entity/Q21    1913     111

Created on 2023-07-28 with reprex v2.0.2

and also

library("glitter")
spq_init() %>%                 
  spq_add("?film wdt:P31 wd:Q11424") %>%        
  spq_add("?film wdt:P840 ?loc") %>%
  spq_add("?film wdt:P577 ?date") %>%  
  spq_mutate(year=year(date)) %>% 
  spq_mutate(year = min(year)) %>% 
  spq_group_by(loc, year) %>% 
  spq_summarize(n_films = n()) %>% 
  spq_head(10) %>% 
  spq_perform()  
#> # A tibble: 10 × 3
#>    loc                                 year n_films
#>    <chr>                              <dbl>   <dbl>
#>  1 http://www.wikidata.org/entity/Q15  1932      25
#>  2 http://www.wikidata.org/entity/Q1   1932       4
#>  3 http://www.wikidata.org/entity/Q2   1932      10
#>  4 http://www.wikidata.org/entity/Q15  1916     307
#>  5 http://www.wikidata.org/entity/Q15  1915     192
#>  6 http://www.wikidata.org/entity/Q17  1915      35
#>  7 http://www.wikidata.org/entity/Q16  1912     319
#>  8 http://www.wikidata.org/entity/Q15  1912      38
#>  9 http://www.wikidata.org/entity/Q21  1912     312
#> 10 http://www.wikidata.org/entity/Q17  1912     152

Created on 2023-07-28 with reprex v2.0.2