tc39 / proposal-json-parse-with-source

Proposal for extending JSON.parse to expose input source text.
https://tc39.github.io/proposal-json-parse-with-source
MIT License
213 stars 9 forks source link

Why is `JSON.rawJSON` limited to primitives only? #46

Open airhorns opened 3 months ago

airhorns commented 3 months ago

Forgive me if this is the wrong spot to put this.

I think JSON.rawJSON is a really powerful API for performance-optimizing JSON serialization. But, because it is limited to only producing valid primitive JSON, it can't be used for "inline"-ing existing JSON.

I've got a couple use cases I want to use it for that requires feeding pre-serialized objects and arrays into the serialization of outer object trees. For example, in a typical REST API, you might retrieve 10 records from the database, and reply with one big JSON array of all of them. Each record might have a big JSON value on it, and if they are large, it performs poorly to de-serialize each record's JSON object to then just serialize it again to produce the REST API response holding all 10 records. Instead, it'd be great to leave the data as a string when fetching from the database, and then just insert it into the final JSON string produced by JSON.stringify using JSON.rawJSON to wrap each of these strings.

Without this capability, one has to resort to manually clobbering together JSON strings which is far less performant and correct than using the engine's built-in capabilities, or always deserializing just to serialize again. Userland implementations like json-stream-stringify are far, far slower, and at least in my case, the JSON objects are really big, so deserializing and reserializing is a major performance issue.

I presume there is a justification for limiting what can be go through a .rawJSON, but what is it? And, could there ever be a trusted mode, or some sort of escape hatch where for very performance sensitive use cases, any ole string could be sent along?

Also one other note: it seems that this low level API could really assist with performance optimization around avoiding re-serializing values you already have the source JSON string for, but as currently specified it can't because it does the safety check by parsing the string anyways. That seems correct but inefficient, again suggesting that it'd be great to have some sort of escape hatch for the brave. Notably, [[IsRawJSON]] being an internal slot means that userland can't create their own raw JSON objects and pay the complexity / reliability price.

airhorns commented 3 weeks ago

@gibson042 apologies for the direct ping but it'd be super helpful to understand this and/or collaborate on widening the applicability!

I ended up open sourcing the thing I would want to use rawJSON for here: https://github.com/gadget-inc/deferredjson

gibson042 commented 3 weeks ago

Thanks for the ping. The reason for limiting to primitive values is cutting off what would otherwise be a bigger opportunity for surreptitious communication by varying representation details within JSON text representing the same data. See https://github.com/tc39/proposal-json-parse-with-source/issues/12#issuecomment-704441889 , https://github.com/tc39/proposal-json-parse-with-source/issues/19#issuecomment-951787505 , and also the extensive discussion at the October 2021 plenary that ultimately resulting in global availability with primitive-only constraints as a balance of convenience vs. integrity (the latter being a concern about the ability for an untrusted data-only input object to encode itself as arbitrary JSON text, originally raised in July 2020).