ropensci / nodbi

Document DBI connector for R
https://docs.ropensci.org/nodbi
Other
76 stars 6 forks source link

Handle JSON objects with src_sqlite #40

Closed rfhb closed 3 years ago

rfhb commented 3 years ago

Description

So far (nodbi 0.4.2), docdb_query.src_sqlite would not return those values from records that are objects (and contain further json data). This was because json_extract() was used, which requires a full json path including numerators for all child objects (https://www.sqlite.org/json1.html#jex).

General handling of objects was not included in my initial implementation of nodbi functions for the RSQLite backend. The PR covers this important use case and enables to retrieve all fields from all paths such as "friends.name". This works by now using json_tree(), finding elements whose paths match the specified fields and by casting values into their reported type (see https://sqlite.org/json1.html#jeach).

The major changes are in docdb_query, which is subsequently used more by other docdb_* functions. At the same time, the code is simpler and cleaner.

Example

The contacts are a json data set that is part of the nodbi package and its friends field has subitems with keys name and id. With this PR, friends in the returned data.frame is a column that is a list with one data.frame per row. More complex objects will have more nesting levels of lists and data frames. The mapdata is additionally included in the package and used below to show how several fields can be extraacted using simple regular expressions (noting that square brackets are not supported because they are used in json strings).

library(nodbi)

con <- src_sqlite()

docdb_create(con, "tbl1", value = data.frame(contacts, stringsAsFactors = FALSE))

(df <- docdb_query(con, "tbl1", fields = '{"friends": 1}', query = '{}'))

#                        _id                                                    friends
# 1 5cd678530df22d3625ed8375                                               0, Pace Bell
# 2 5cd678531b423d5f04cfb0a1                                0, 1, Yang Yates, Lacy Chen
# 3 5cd6785325ce3a94dfc54096      0, 1, 2, Baird Keller, Francesca Reese, Dona Bartlett
# 4 5cd6785335b63cb19dfa8347      0, 1, 2, Coleen Dunn, Doris Phillips, Concetta Turner
# 5 5cd67853f841025e65ce0ce2 0, 1, 2, Wooten Goodwin, Brandie Woodward, Angelique Britt

df[["friends"]][[5]]

#   id             name
# 1  0   Wooten Goodwin
# 2  1 Brandie Woodward
# 3  2  Angelique Britt

str(df)

# 'data.frame': 5 obs. of  2 variables:
#  $ _id    : chr  "5cd678530df22d3625ed8375" "5cd678531b423d5f04cfb0a1"  ...
#  $ friends:List of 5
#   ..$ :'data.frame':  1 obs. of  2 variables:
#   .. ..$ id  : int 0
#   .. ..$ name: chr "Pace Bell"
#   ..$ :'data.frame':  2 obs. of  2 variables:

docdb_create(con, "tbl1", value = data.frame(mapdata, stringsAsFactors = FALSE))

docdb_query(con, "tbl1", fields = '{"rows.elements.d.*text": 1, "xyz": 1}', query = '{}')
docdb_query(con, "tbl1", fields = '{"rows.elements.\\\\S+.text": 1}', query = '{}')
docdb_query(con, "tbl1", fields = '{"rows.*somevalue": 1}', query = '{}')