tc39 / proposal-json-parse-with-source

Proposal for extending JSON.parse to expose input source text.
https://tc39.github.io/proposal-json-parse-with-source
MIT License
213 stars 9 forks source link

add an options object to JSON.parse and JSON.stringify #5

Closed allenwb closed 4 years ago

allenwb commented 4 years ago

I want to revive a suggestion I made the last time I say a proposal to tweak ES JSON processing: https://github.com/tc39/proposal-well-formed-stringify/issues/4#issue-321969780

There is nothing particularly sacred about the JSON/JavaScript mapping imposed by JSON.parse and JSON.stingify. They would both greatly benefit from an options object that allowed more control over the mapping (for example, enabling built-in internalization of large integer as BigInts or the features discussed in the present proposal).

I think introducing such an options object is the right place to start working on JSON processing extensions.

jlyonsmith commented 4 years ago

I agree. I added an options object to implement JSON node parsing in my fork of JSON5 at jlyonsmith/json5.

kaizhu256 commented 4 years ago

besides bigint (and proposed bigdecimal), are there any other compelling use-cases? if not, maybe this proposal should narrow its scope to only those?

example scenario: say i want to message-pass 64-bit docid's between 2 java-servers with nodejs-proxy in between:

java1 <-> nodejs-proxy <-> java2

json message:

{
    "docid": 123456789012345678901234567890,
    "tags": ["foo", "bar"]
}

is there a way for nodejs-proxy to modify the message as follows (but still preserve 64-bit docid):

{
    "docid": 123456789012345678901234567890,
-    "tags": ["foo", "bar"]
+    "tags": ["foo", "bar", "baz"]
}

i think this is the common-case scenario for this proposal to solve.

jlyonsmith commented 4 years ago

@kaizhu256 There are absolutely compelling use cases outside of BigInt conversions. The ability to get at the JSON source information for error reporting and more open ended data type support that has some future proofing are just two good examples.

kaizhu256 commented 4 years ago

can you give example-scenarios of other open-ended data-types? datetime and binary-data are the only other ones that come to mind, and i'm skeptical of both (isostrings and base64 are good-enough).

i don't think you can add anymore datattypes to JSON w/o ruining its simplicity/idiot-proofness, which are its main selling-points.

jlyonsmith commented 4 years ago

"Everything that can be invented has been invented" :D I'm not sure if we are quite talking about the same thing @kaizhu256 My use case is for extending the JSON.parse callback so that it is possible to get the context information about where the key/value came from in the JSON. Yours I believe is with the fidelity of BigInt transcoding. My opinion related to your use case is that it is never a good idea to assume that other data types won't come along in the future that raise similar issues to BigInt and it would be a good idea to have a modification that could handle those. We don't know what we don't know, but we can build a reasonable solution that handles the current cases and handle conversion to/from a string correctly as well as preserving the simplicity of the JSON data format.

gibson042 commented 4 years ago

There may be reason to introduce an options bag to JSON.parse, but I don't think this proposal is it. What we're suggesting here is solely extensions to the data supplied to reviver functions, a kind of change that historically and currently seems not to warrant new author-supplied parameters (e.g., https://github.com/tc39/proposal-regexp-match-Indices adds an indices property to every regular expression match object, just like the earlier https://github.com/tc39/proposal-regexp-named-groups added groups).

gibson042 commented 4 years ago

I would be open to extending this proposal for better supporting @kaizhu256's use case if I had confidence that it wouldn't mire things down. Unfortunately, I can only think of two feasible ways to do so, and they're both very messy.

  1. Change argument processing, either introducing a fourth parameter (i.e., value [ , replacer [ , space [ , options ] ] ] for calls like JSON.stringify(obj, null, null, options)—:nauseated_face:) or overriding an existing one (e.g., value [ , optionsOrReplacer [ , space ] ]—also :nauseated_face:).
  2. Change replacer processing to look for a signal that the output is to be interpreted as "raw" rather than subject to further JSON serialization (e.g., when a replacer function's return value is tagged with a well-known symbol dedicated to this purpose—and yet again, :nauseated_face:).

I think it would be better to have raw JSON serialization in a separate proposal, with primary emphasis on BigInt and BigDecimal.

allenwb commented 4 years ago

@gibson042

Unfortunately, I can only think of two feasible ways to do so, and they're both very messy.

I referenced (https://github.com/tc39/proposal-well-formed-stringify/issues/4#issue-321969780 ) containing my proposed solution when I opened this issue:

Note that the current semantics for the second parameter to both methods only recognizes function or native Array objects. All other objects passed in that position are ignored. That means that it would be backwards compatible to revised their specification to accept an options object as the second parameter. Such an option object should be defined to have option properties corresponding to the existing positional options.

In other words, the two available argument formats for JSON.stringify would be:\ value [, options]\ and value [, replacer [, space] ]

with any function or Array.isArray object recognized as a replacer and all other objects being recognized as an options object.

The two argument "options" form would still supports all the functionality available through the legacy three argument form. The standard set of recognized option properties would include a "replacer" property that if present is processed identically to the current replacer argument and a "space" property that is processed identically to the current space argument. I suspect the two argument options form would become the preferred one and the legacy three argument form would fall into disfavor. However, for maximal legacy capability we could also allow the form\ value [, options , space]\ where the value of the space argument over-rides a "space" options property.

kaizhu256 commented 4 years ago

so with options-bag, how would actual solution to the nodejs-proxy scenario

{
    "docid": 1234567890, // bigint
-    "tags": ["foo", "bar"]
+    "tags": ["foo", "bar", "baz"]
}

look like in code? something like following?

require("http").createServer(async function (req, res) {
    var result;

    result = await ...  // read message from http-request-stream
    // result = "{\"docid\":1234567890,\"tags\":[\"foo\",\"bar\"]}"

    result = JSON.parse(
        result,
        {
            reviver: function (key, val, src) {
                if (key === "docid") {
                    return BigInt(src);
                }
                return val;
            }
        }
    );
    // result = {"docid":1234567890n,"tags":["foo","bar"]}

    result.tags.push("baz");
    // result = {"docid":1234567890n,"tags":["foo","bar","baz"]}

    result = JSON.stringify(
        result,
        {
            replacer: function (key, val) {
                // stringify bigint with unique-prefix
                if (typeof val === "bigint") {
                    return "biginit_d8safu2_" + val.toString();
                }
                return val;
            }
        }
    );
    // result = "{\"docid\":\"biginit_d8safu2_1234567890\",\"tags\":[\"foo\",\"bar\",\"baz\"]}"

    result = result.replace((
        /"biginit_d8safu2_(\d+)"/g
    ), "$1");
    // result = "{\"docid\":1234567890,\"tags\":[\"foo\",\"bar\",\"baz\"]}"

    res.end(result);
}).listen(8080);
allenwb commented 4 years ago

so with options-bag, how would actual solution to the nodejs-proxy scenario

To start we would have to define a set of options to enable different BitInt serialization/deserialization scenarios. As a strawman, assume we have two alternatives available:

So, for this example (and based upon what I infer from above of nodejs-proxy schema) the options object could be either: {BigIntAll: true} or {BinIntProps: ["dicid"]}. The latter would be the one you would want to use if the schema also includes other properties that you don't want to be processed as big ints. So: your example could be rewritten as:

require("http").createServer(async function (req, res) {
    var result;

    var JSONOptions = {BinIntProps: ["dicid"]};

    result = await ...  // read message from http-request-stream
    // result = "{\"docid\":1234567890,\"tags\":[\"foo\",\"bar\"]}"

    result = JSON.parse(results, JSONOptions);
    // result = {"docid":1234567890n,"tags":["foo","bar"]}

    result.tags.push("baz");
    // result = {"docid":1234567890n,"tags":["foo","bar","baz"]}

    result = JSON.stringify(result, JSONOptions);
    // result = "{\"docid\":1234567890,\"tags\":[\"foo\",\"bar\",\"baz\"]}"

    res.end(result);
}).listen(8080);
gibson042 commented 4 years ago

@allenwb That still wouldn't work for input like

{
    "docid": 9007199254740992.5,
    "tags": ["foo", "bar"]
}
ljharb commented 4 years ago

Instead of trying to select a scenario, it seems like the best option is to allow the user to provide a callback, invoke it with all the info it needs to make a decision, and let the user encode their scenario.

(which, as i understand it, is what this proposal already achieves in combination with a reviver)

allenwb commented 4 years ago

So, what's the schema of the record being processed? In a well-formed record is the docId supposed to be a integer with an arbitrary number of digits (EG, a BitInt). Or is it an arbitrary precision decimal fraction? Or, in which case perhaps it is BigDecimal support you want.

Regardless, I replied based on the assumption that the schema required docid to be an integer JSON number. If it is something else you would have to parameterize JSON.parse differently.

I note that the original example in https://github.com/tc39/proposal-json-parse-with-source/issues/5#issuecomment-573121472 parse a JSON text produces a JS object whose docid property might have either a BiInt or a Number (or possibly even something else) as its value. This is going to complicate the processing of those objects as Numbers and BitInts can't always be used interchangeably. I would think that you would want to simplify processing by always having a BitInt, even if its value was within the Number.MAX_SAFE_INTEGER range.

Also, I notice that in the original example, the reviver function accepts three arguments. The current specification for JSON.parse says a reviver only takes two arguments. Perhaps, there is TC39 proposal for adding a parameter to reviver? Or maybe the example was written assuming a non-standard implementation of JSON.parse?

Finally, my main point was to show that JSON.parse/stringify could be extended in a backwards compatible manner to enable automatic process of large integer JSON Numbers as BitInts. The specific details of the most useful ways to parameterize JSON.parse/stringify to accomplish that still needs to be worked out.

ljharb commented 4 years ago

I believe this proposal doesn’t add BigInt (or BigDecimal) support - it just provides the original source string, which leaves it up to the programmer to decide if they want BigInt, BigDecimal, or any other arbitrary data structure.

kaizhu256 commented 4 years ago

bigint/bigdecimal are the primary (or i would argue only) motivation for this proposal. yes, i added an extra 3rd parameter to the reviver, so that bigint could be parsed.

i like @allenwb's strawman for its ease-of-use, but to instead use conditional-coercion BigIntConditional:

require("http").createServer(async function (req, res) {
    var result;

    var JSONOptions = {
        // conditionally-coerce to bigint if integer and outside Number.MAX_SAFE_INTEGER
        // the user (not JSON.parse) is responsible for explicit bigint coercion
        BigIntConditional: true,
        // secure against malicious 1-million bit bigint
        BigIntMaxBits: 64
    };

    result = await ... // read message from http-request-stream
    // result = "[\
    //     {\"docid\":1234, \"tags\":[]},\
    //     {\"docid\":12345678901234567890, \"tags\":[]}\
    // ]"

    result = JSON.parse(result, JSONOptions);
    // result = [
    //     {"docid":1234, "tags":[]},
    //     {"docid":12345678901234567890n, "tags":[]} // conditional-coercion
    // ]

    result.forEach(function (elem) {
        // the user (not JSON.parse) is responsible for explicit bigint coercion
        elem.docid = BigInt(elem.docid);
        elem.tags.push("baz");
    });
    // result = [
    //     {"docid":1234n, "tags":["baz"]},
    //     {"docid":12345678901234567890n, "tags":["baz"]}
    // ]

    result = JSON.stringify(result, JSONOptions);
    // result = "[\
    //     {\"docid\":1234, \"tags\":[\"baz\"]},\
    //     {\"docid\":12345678901234567890, \"tags\":[\"baz\"]}\
    // ]"

    res.end(result);
}).listen(8080);

schema-handling should be out-of-scope of JSON.parse if possible, to keep it idiot-proof.

gibson042 commented 4 years ago

Also, I notice that in the original example, the reviver function accepts three arguments. The current specification for JSON.parse says a reviver only takes two arguments. Perhaps, there is TC39 proposal for adding a parameter to reviver?

Not to put too fine a point on it, but that's this proposal.

bigint/bigdecimal are the primary (or i would argue only) motivation for this proposal.

They aren't the only motivation, and one could also imagine wanting to differentiate 4.0 and 4 or (less likely) "foo" and "\x66\x6f\x6f". But even if they were, I would argue that even something like BitIntProps is still too blunt, because it may be the case that the same key appears at different places in a data structure and should only sometimes be deserialized as a BigInt.

But this has drifted a great deal from the subject of the proposal, which as I said above should not in itself warrant changes to the argument processing of JSON.parse. I'm going to close this issue, but am open to a more germane followup.