jeroen / mongolite

Fast and Simple MongoDB Client for R
https://jeroen.github.io/mongolite/
287 stars 65 forks source link

Ensuring tibbles instead of data frames upon DB read #189

Open jankowtf opened 4 years ago

jankowtf commented 4 years ago

Hi Jeroen,

my issue/question is related to https://github.com/jeroen/jsonlite/issues/253.

I tried using $iterate() as you suggested, but I still can't seem to figure out a "built-in" way to ensure that "tabular data" is parsed as tibbles instead of data.frames.

Can/do I need to reach in deeper into things like jsonlite::simplify() to achieve this or is this simply something that needs to happen after data reads from MongoDB?


Here's what I tried (also put it down as a Stackoverflow post)

Example

JSON data to put into file dump.json

{"labels": ["label-a", "label-b"],"levelOne": {"levelTwo": {"levelThree": [{"x": "A","y": 1,"z": true},{"x": "B","y": 2,"z": false}]}},"schema": "0.0.1"}
{"labels": ["label-a", "label-b"],"levelOne": {"levelTwo": {"levelThree": [{"x": "A","y": 10,"z": false},{"x": "B","y": 20,"z": true}]}},"schema": "0.0.1"}

Importing JSON into MongoDB

con <- mongolite::mongo(
  db = "stackoverflow",
  collection = "nested_json"
)

con$import(file("dump.json"))

This is what you should see within MongoDB

enter image description here

Query via $find()

query_res <- con$find() %>%
  tibble::as_tibble()

query_res %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  3 variables:
#  $ labels  :List of 2
#   ..$ : chr  "label-a" "label-b"
#   ..$ : chr  "label-a" "label-b"
#  $ levelOne:'data.frame': 2 obs. of  1 variable:
#   ..$ levelTwo:'data.frame':  2 obs. of  1 variable:
#   .. ..$ levelThree:List of 2
#   .. .. ..$ :'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  1 2
#   .. .. .. ..$ z: logi  TRUE FALSE
#   .. .. ..$ :'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  10 20
#   .. .. .. ..$ z: logi  FALSE TRUE
#  $ schema  : chr  "0.0.1" "0.0.1"

Desired result

query_res$levelOne <- query_res$levelOne %>% tibble::as_tibble()
query_res$levelOne$levelTwo <- query_res$levelOne$levelTwo %>% 
  tibble::as_tibble()
query_res$levelOne$levelTwo$levelThree <- query_res$levelOne$levelTwo$levelThree %>% 
  purrr::map(tibble::as_tibble)

query_res %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  3 variables:
#  $ labels  :List of 2
#   ..$ : chr  "label-a" "label-b"
#   ..$ : chr  "label-a" "label-b"
#  $ levelOne:Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  1 variable:
#   ..$ levelTwo:Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  2 obs. of  1 variable:
#   .. ..$ levelThree:List of 2
#   .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  1 2
#   .. .. .. ..$ z: logi  TRUE FALSE
#   .. .. ..$ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame':    2 obs. of  3 variables:
#   .. .. .. ..$ x: chr  "A" "B"
#   .. .. .. ..$ y: int  10 20
#   .. .. .. ..$ z: logi  FALSE TRUE
#  $ schema  : chr  "0.0.1" "0.0.1"

If I try to do that via dplyr::mutate() or purrr::map*_df(), I get the Error: Columnis of unsupported class data.frame error.

Query via $iterate()

it <- con$iterate()

iter_res <- list()
while(!is.null(x <- it$one())) {
  # Ensure array columns stay individual list columns when casting to tibble:
  # (As opposed to multiple array items being turned into one tibble row)
  p <- function(x) {
    is.list(x) &&
      is.null(names(x))
  }
  f <- function(x) {
    list(x %>% unlist())
  }
  x <- x %>% purrr::map_if(p, f)

  # Necessary to get the `simplifyVector = TRUE` effect:
  iter_res_current <- x %>%
    jsonlite:::simplify() %>%
    tibble::as_tibble()

  # Combine with previous iteration results:
  iter_res <- c(iter_res, list(iter_res_current))
}
iter_res_df <- iter_res %>%
  dplyr::bind_rows()

iter_res_df %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 2 obs. of  3 variables:
#  $ labels  :List of 2
#   ..$ : chr  "label-a" "label-b"
#   ..$ : chr  "label-a" "label-b"
#  $ levelOne:List of 2
#   ..$ :List of 1
#   .. ..$ levelThree:'data.frame': 2 obs. of  3 variables:
#   .. .. ..$ x: chr  "A" "B"
#   .. .. ..$ y: int  1 2
#   .. .. ..$ z: logi  TRUE FALSE
#   ..$ :List of 1
#   .. ..$ levelThree:'data.frame': 2 obs. of  3 variables:
#   .. .. ..$ x: chr  "A" "B"
#   .. .. ..$ y: int  10 20
#   .. .. ..$ z: logi  FALSE TRUE
#  $ schema  : chr  "0.0.1" "0.0.1"