sailthru / tidyjson

Tools for using dplyr with JSON data
Other
161 stars 15 forks source link

Unify code behind gather_keys and gather_array #9

Open jeremystan opened 10 years ago

jeremystan commented 10 years ago

gather_keys and gather_array have very similar formats. Can they be unified in some way that simplifies the code?

jeremystan commented 10 years ago

First pass at this was difficult... a function factory may be too complex or abstract. Perhaps can be broken into smaller functions?

adgaudio commented 9 years ago

we should also bundle a gather_values or gather_items feature into this refactor. Currently, we don't have an elegant way to treat the values of a large JSON dictionary as an array, and input json data that doesn't start as an array is not valuable for us.

For example, it's currently difficult to extract color from this structure

{"a": {"color": "blue"},
 "b": {"color": "red"},
 "c": {"color": "blue"}
}

without using lapply like this:

    json %>% gather_keys,
    function(key) {
        json %>% spread_values( color = jstring(key, "color") )
    }) %>%
    rbind_all

It would be nicer to do

json %>% gather_values %>% spread_values("color")

# or 

json %>% gather_items %>% spread_values("color")  ## for gather_items, I guess we could create a column called "key.1" or something
jeremystan commented 9 years ago

To extract color from this JSON:

json <- '{"a": {"color": "blue"},
  "b": {"color": "red"},
  "c": {"color": "blue"}
}'

the following works:

json %>% as.tbl_json %>% gather_keys("letter") %>% spread_values(color = jstring("color"))
  document.id letter color
1           1      a  blue
2           1      b   red
3           1      c  blue