ropensci / jqr

R interface to jq
https://docs.ropensci.org/jqr
Other
143 stars 13 forks source link

lq: list query #64

Closed sckott closed 3 years ago

sckott commented 6 years ago

idea from @cboettig

AFAICT idea is just to make it easy to do the stuff we can do with jq on JSON but with R lists.

is that as easy as jsonlite::toJSON, then apply any jq commands?

cboettig commented 6 years ago

yes (and ending with a fromJSON so the user only deals with lists.) Would be good to expose the fromJSON arguments regarding return type, which provide some great options for optionally simplifying automatically (e.g. returning a data.frame if appropriate).

But I think the real win here would come from honing the DSL in jqr a bit further.

sckott commented 6 years ago

Noted on fromJSON at the end

select could be a little cleaner for multiples, maybe this can be handled automatically?

can you clarify what you mean by this?

Right, an even higher level DSL on top of the DSL might be needed to achieve what you're thinking about.

cboettig commented 6 years ago

Clarifying comment about select multiples, I haven't really thought this through. I found this a bit unintuitive:

x <- '{"user":"stedolan","titles":["JQ Primer", "More JQ"]}'
x %>% select(user, title = `.titles[]`)

but then I realized that

x %>% select(user, title = `.titles`)

does what I want, when I was thinking it would throw an error if titles was a multiple.

I really like how your DSL works so far; it's definitely inspired me. I think so many of these operations would be super useful for lists in general.

I think I'm not yet thinking about the higher-level DSL correctly; you probably don't actually want direct analogs like filter and mutate on lists. You really just want a concise way of rectangling the data in the desired way and then use native dplyr on the resulting data.frame.

In your current DSL, it's not obvious to me how do object construction, e.g: what would the following query look like in the DSL?

f <- system.file("extdata/gh_repos.json", package="repurrrsive")

read_file(f) %>% 
 jq('.[][] | { 
    name, 
    issues: .open_issues_count,
    wiki: .has_wiki,
    homepage,
    owner: .owner.login
    } ')
sckott commented 6 years ago

thanks for clarification. this is the first thing came up with using the DSL, but it looks about the same as using the low level interface:

`tj` <- function(x) jsonlite::toJSON(x, auto_unbox = TRUE)
`fj` <- function(x, ...) {
  if (!jsonlite::validate(x)) x <- jqr::combine(x)
  jsonlite::fromJSON(x, ...)
}

f <- system.file("extdata/gh_repos.json", package="repurrrsive")
js <- jsonlite::fromJSON(f)
z <- (tj(js) %>% 
  jq('.[][]') %>% 
  select(
    name, 
    issues = .open_issues_count,
    wiki = .has_wiki,
    homepage,
    owner = .owner.login
  )
) %>% fj
tibble::as_tibble(z)
#> # A tibble: 176 x 5
#>    name        issues wiki  homepage owner
#>  * <chr>        <int> <lgl> <chr>    <chr>
#>  1 after            0 TRUE  NA       gaborcsardi
#>  2 argufy           6 TRUE  NA       gaborcsardi
#>  3 ask              4 TRUE  NA       gaborcsardi
#>  4 baseimports      0 TRUE  NA       gaborcsardi
#>  5 citest           0 TRUE  NA       gaborcsardi
#>  6 clisymbols       0 TRUE  ""       gaborcsardi
#>  7 cmaker           0 TRUE  NA       gaborcsardi
#>  8 cmark            0 TRUE  NA       gaborcsardi
#>  9 conditions       0 TRUE  NA       gaborcsardi
#> 10 crayon           7 TRUE  NA       gaborcsardi
#> # ... with 166 more rows
``
sckott commented 6 years ago

thinking about this some more:

working with list inputs and lowl level interface

foo <- function(x, program, ...) {
  tmp <- jqr::jq(jsonlite::toJSON(x), program)
  if (!jsonlite::validate(tmp)) tmp <- jqr::combine(tmp)
  jsonlite::fromJSON(tmp, ...)
}
`tj` <- function(x) jsonlite::toJSON(x, auto_unbox = TRUE)
`fj` <- function(x, ...) {
  if (!jsonlite::validate(x)) x <- jqr::combine(x)
  jsonlite::fromJSON(x, ...)
}
# low level
foo(x, ".")
#> $a
#> $a$b
#> $a$b$c
#>  [1]  1  2  3  4  5  6  7  8  9 10
#> 
#> $a$b$d
#>  [1]  1  2  3  4  5  6  7  8  9 10

foo(x, ".[]")
#> $b
#> $b$c
#>  [1]  1  2  3  4  5  6  7  8  9 10
#> 
#> $b$d
#>  [1]  1  2  3  4  5  6  7  8  9 10

foo(x, ".[][]")
#> $c
#>  [1]  1  2  3  4  5  6  7  8  9 10
#> 
#> $d
#>  [1]  1  2  3  4  5  6  7  8  9 10

foo(x, ".[][] | keys | reverse")
#> [1] "d" "c"

# dsl
(tj(x) %>% index() %>% index() %>% keys() %>% reverse) %>% fj
#> [1] "d" "c"
sckott commented 6 years ago

see also #65 and #66

sckott commented 6 years ago

@cboettig if you get a chance try out work on branch https://github.com/ropensci/jqr/tree/select-and-objects - select() was wrongly doing object construction (in jq as {}) - so added two new function build_object (equivalent to jq {}) and build_array (equivalent to jq []) - and changed select() to do filtering, matching what jq select does - anyway, so in the above examples with select() just replace with build_object()

cboettig commented 6 years ago

Nice! yeah, the lack of an explicit build_object function and doing that inside select confused me, so really nice to see the new build_object and build_array. (though while these are good names for the jq DSL, an lq aliases build_list and build_vector might be more familiar to R users).

sckott commented 6 years ago

good point about function names.

sckott commented 6 years ago

@cboettig Are you thinking separate pkg for lq to play more freely?

cboettig commented 6 years ago

Re separate package: I hadn't thought about that but could be a good idea -- would certainly make it more visible for users who want to query lists but don't care about JSON. Like you say, it would also give us more room to play without messing with things for jqr users. 👍

sckott commented 6 years ago

Okay, fire away at https://github.com/ropensci/lq

sckott commented 3 years ago

closing, package was made at lq, but archived now https://github.com/ropensci-archive/lq