jeroen / jsonlite

A Robust, High Performance JSON Parser and Generator for R
http://arxiv.org/abs/1403.2805
Other
377 stars 40 forks source link

`jsonlite`'s handling of arrays is confusing #368

Open gegnew opened 3 years ago

gegnew commented 3 years ago

Alternatively, my use of jsonlite is confusing..let's find out:

When you retrieve documents in bulk and then get a single entity from the list (see code below), then a property that is a list is returned as a nested list.

i.e. for JSON like,

[
  {
    "_id": "5d2f8b4b21fd0676fb3a6a71",
    "annotations": [
      {
        "value": "IL2/GM-CSF",
        "name": "Condition",
        "type": "any"
      }
    ],
  },
  {
    "_id": "5d2f8b4b21fd0676fb3a6a72",
    "annotations": [
      {
        "name": "Condition",
        "value": "IL10"
      },
      {
        "name": "annotations 1",
        "value": "1"
      },
      {
        "name": "annotation 2",
        "value": "value"
      }
    ],
  },
]

then jsonlite returns the "annotations" list item as a <list> [<data.frame>].

docs = jsonlite::fromJSON(content)
d = docs[1,]
d$annotations
# prints:
[[1]]
           name value
1 annotations 1     1
2  annotation 2 value

However, for JSON like,

{
  "_id": "5d2f8b4b21fd0676fb3a6a71",
  "annotations": [
    {
      "value": "IL2/GM-CSF",
      "name": "Condition",
      "type": "any"
    }
  ],
}

then "annotations" are simply a data.frame

d = getDocument(id)
d$annotations
# prints:
           name value
1 annotations 1     1
2  annotations 2 value

This is particularly annoying because it also occurs for nested objects in documents returned as single-item JSON arrays, like

[
  {
    "_id": "5d2f8b4b21fd0676fb3a6a8c",
    "items": [
      {
        "name": "A",
        "item": {
          "type": "type A",
        }
      },
      {
        "name": "B",
        "item": {
          "type": "type B",
        }
      }
  }
]

where "items" must be gotten with myParsedObject$items[[1]]. However, if the outer brackets in the above JSON are removed, "items" can be gotten with myParsedObject$items.

In all these cases, what I find troublesome is the need to use [[ ]] notations sometimes, but not always. Is there a way to get around this?

GregorDeCillia commented 2 years ago

You can use jsonlite::fromJSON(content, simplifyVector = FALSE) to disable all simplifications. Those simplifications include the automatic conversion of certain nested vectors to arrays/dataframes. See also the parameters simplifyMatrix and simlifyDataFrame in the man page of jsonlite::fromJSON()

# enable all simplifications
str(jsonlite::fromJSON("[[1,2], [3,4]]", simplifyVector = TRUE))
#>  int [1:2, 1:2] 1 3 2 4

# disable all simplifications
str(jsonlite::fromJSON("[[1,2], [3,4]]", simplifyVector = FALSE))
#> List of 2
#>  $ :List of 2
#>   ..$ : int 1
#>   ..$ : int 2
#>  $ :List of 2
#>   ..$ : int 3
#>   ..$ : int 4

# only simplify vectors
fromJSON("[[1,2], [3,4]]", simplifyVector = TRUE, simplifyDataFrame = FALSE, simplifyMatrix = FALSE)
#> List of 2
#>  $ : int [1:2] 1 2
#>  $ : int [1:2] 3 4

I found that the last version (only simplify vectors) makes the most sense for my codebases but this will obviusly depend on your setup