tc39 / proposal-json-parse-with-source

Proposal for extending JSON.parse to expose input source text.
MIT License
214 stars 9 forks source link

Extend this proposal to include serialization? #12

Closed gibson042 closed 2 years ago

gibson042 commented 4 years ago

This proposal currently covers only the parsing side, but full round-tripping would also require serialization of e.g. BigInt values as unquoted digit sequences. The committee seemed tepid about including serialization in this proposal, but I still wanted to capture the concept even if it is rejected as expected.

kaizhu256 commented 4 years ago

i think serializing bigint will likely require an additional options argument in JSON.stringify as mentioned in cookbook scenarios.

if the intent is to roundtrip parse and stringify bigints, then i feel this proposal is a dead-end and not the way to go.

rauschma commented 4 years ago

Yes, please! Roundtripping seems such a core use case that it would be a shame if it weren’t supported.

One possibility:

function bigintReplacer(_key, value) {
  if (typeof value === 'bigint') {
    return JSON.rawSource(String(value));
    // Or: return {[Symbol.rawJsonSource]: String(value)};
  return value;
gibson042 commented 4 years ago

Including serialization will require motivating use cases. I can imagine needing to preserve uint64 data (e.g., Twitter ids) and possibly high-precision sensor data (e.g., IEEE binary128), but could use some broader and/or more concrete examples.

kaizhu256 commented 4 years ago

but could use some broader and/or more concrete examples.

hypothetical-but-credible-finapp-example, is message-passing sql-tables with [arbitrary] bigint-columns between browser <-> server.

e.g. serialize following sql-table:

id    stock         market_cap
--    --------      ------------------
 1    aapl          $1,690,000,000,000
 2    amzn          $1,550,000,000,000
 2    goog          $1,070,000,000,000

to space-efficent json-form:

    "columns": [ "id", "stock", "market_cap" ],
    "rows": [
        [ 1, "aapl", 1690000000000 ],
        [ 2, "amzn", 1550000000000 ],
        [ 3, "goog", 1070000000000 ]

and roundtrip-message-pass between [browser] sql.js <-> [server] mssql.

gibson042 commented 4 years ago

Thanks. Those numbers are three orders of magnitude less than Number.MAX_SAFE_INTEGER, but perhaps there is something similar for cryptocurrencies or whole-market summations.

gibson042 commented 3 years ago

There was consensus on the TC39 Incubator call to include serialization in this proposal to avoid shipping an incomplete solution with corresponding ecosystem fragmentation when serialization is ultimately added.

To avoid surreptitious output hijacking, the approach will tentatively use wrapping objects with symbols that are unique for each invocation of JSON.stringify, e.g.

let rawTags = [];
function replacer(key, val, {rawTag}) {
  if ( typeof val !== "bigint" ) return val;
  // Serialize BigInt values as raw digit strings.
  return {[rawTag]: String(val)};

// BigInt values serialize in context as raw digit strings.
assert.strictEqual(JSON.stringify([1n], replacer), "[1]");
assert.strictEqual(JSON.stringify([2n], replacer), "[2]");

// The replacer was invoked four times (once for each array and once for each array element).
assert.strictEqual(rawTags.length, 4);

// The rawTag values match for the first two invocations and the second two invocations.
assert.strictEqual(rawTags[1], rawTags[0]);
assert.strictEqual(rawTags[3], rawTags[2]);

// ...but not between the first and second invocations.
assert.notStrictEqual(rawTags[1], rawTags[2]);
bergus commented 3 years ago

How will this work with .toJSON() methods? Are they allowed to return raw values as well?

Can you elaborate about "surreptitious output hijacking", what scenarios are you worried about? (Is there literature about attack vectors, or generic security advice?) I can think of a well-known Symbol.rawJsonSource passing security boundaries and affecting JSON output where I wouldn't want it, but a realm-specific JSON.rawSource should be only accessible to those who could also overwrite JSON.stringify itself. Unless it's leaked…

Do the raw source contents need to be valid JSON texts/tokens? Could I use JSON.stringify to output, say, YAML with the right replacer?

mhofman commented 2 years ago

Should the algorithm verify that the raw value parses as JSON to deal with the output hijacking in the case where the replacer is composed of potentially untrusted behaviors (e.g. delegating to a class specific replacer) without requiring the replacer itself to implement this type of checking.

legendecas commented 2 years ago

I'm curious about if there are other use cases for raw json source replacer, besides from output BigInt as unquoted digit sequences? Would it be more feasible to just add BigInt primitive stringify support in JSON.stringify instead of exposing generic raw json source replacer?

bakkot commented 2 years ago

I can think of at least a couple: