jeroen / jsonlite

A Robust, High Performance JSON Parser and Generator for R
http://arxiv.org/abs/1403.2805
Other
377 stars 40 forks source link

fromJSON -> toJSON is not symetrical #167

Closed chrisknoll closed 7 years ago

chrisknoll commented 7 years ago

Consider the following json string created in R:

jsonText <- "{\"CONCEPT_ID\":140168,\"CONCEPT_NAME\":\"Psoriasis\",\"STANDARD_CONCEPT\":\"S\",\"INVALID_REASON\":\"V\",\"CONCEPT_CODE\":\"9014002\",\"DOMAIN_ID\":\"Condition\",\"VOCABULARY_ID\":\"SNOMED\",\"CONCEPT_CLASS_ID\":\"Clinical Finding\",\"STANDARD_CONCEPT_CAPTION\":\"Standard\",\"INVALID_REASON_CAPTION\":\"Valid\"}"

This is simply an output JSON from a REST api lookup. The \" in the command are just to escape the quotes in the R string, but in JSON all field names need to be surrounded by " and values are either strings or ints.

You would expect that reading this json in from this tring and then writing it back out as JSON should (I'd argue MUST) be the same:

toJSON(fromJSON(jsonText))
# produces:
#  {"CONCEPT_ID":[140168],"CONCEPT_NAME":["Psoriasis"],"STANDARD_CONCEPT":["S"],"INVALID_REASON":["V"],"CONCEPT_CODE":["9014002"],"DOMAIN_ID":["Condition"],"VOCABULARY_ID":["SNOMED"],"CONCEPT_CLASS_ID":["Clinical Finding"],"STANDARD_CONCEPT_CAPTION":["Standard"],"INVALID_REASON_CAPTION":["Valid"]} 

Note the values are now inside arrays. Arrays are NOT the same as native values, and this completely breaks the structure of the object such that it can not be submitted back to another REST call.

Isn't there a way in R to distinguish the difference between a single value and an array of values? I noticed in R if you do something like this:

x <- 5;
is.vector(x)
# returns TRUE

So is this the heart of the problem, that all single value things are treated as vectors? Because JSON semantics is NOT like that, and so if we're really talking about JSON processing, jsonlite should defer to what is expected in the JSON context, not what's expected in the R context. I understand this may rub the more R-minded folks that are used to dealing with vectors in all cases from their statistical perspective, but I feel JSON should behave like JSON, and then provide other library functions to translate it over to a vector-context from a JSON context, and not do that sort of thing automatically...becuase as I hope I've shown above, not getting the same value from fromJSON->toJSON is certainly wrong.

jeroen commented 7 years ago

Chill out on the caps dude. Have a look at ?unbox and the auto_unbox parameter in ?toJSON.

If you want to roundtrip json to R and back, use these arguments:

obj <- fromJSON(json, simplifyVector = FALSE)
json <- toJSON(obj, auto_unbox =TRUE)
chrisknoll commented 7 years ago

That's a lot of caps in your reply there ;) But the key is reading in with simplifyVector = FALSE and then writing out with auto_unbox = TRUE.

Here's what I was working with that demonstrated my problem in a simpler form:

testJSONSingle <- "{\"a\": [1], \"b\":[1,2], \"c\":1}"
jsonlite::toJSON(fromJSON(testJSONSingle), auto_unbox = TRUE)
# produces {"a":1,"b":[1,2],"c":1} , a is supposed to be an array
jsonlite::toJSON(fromJSON(testJSONSingle), auto_unbox = FALSE)
#produces {"a":[1],"b":[1,2],"c":[1]}, c is not supposed to be an array.
jsonlite::toJSON(fromJSON(testJSONSingle, simplifyVector=FALSE), auto_unbox = TRUE)
#produces {"a":[1],"b":[1,2],"c":1} , correct!  (keeping lower case to contain my excitement)

Here's my suggestion, for what it's worth: when reading in the json string, the content is telling you if it is going to be a list of things or a single element via the [] notation. I think this is related to the issue described in #140 although if they are talking about modifying the json structure, then they must know something about the schema ahead of time.

Thank you for the pointer, I'm not sure if there's ever a case where you would want to not simplifyVector=FALSE and auto_unbox=TRUE, but the defaults are the reverse so I scratch my head at that, maybe it's so that the data is in a more R-friendly form for statistical processing? But thanks again.

jeroen commented 7 years ago

The defaults are optimized for roundtripping common R types to JSON and back, for example:

json <- toJSON(iris)
fromJSON(json)

That's a different use case than just parsing json. Have a look at the paper for a more detailed outline of the motivation behind this mapping:

rvernica commented 6 years ago

I ran into this issue as well. I wish this would have been made more clear in the docs. Maybe more examples would help.