Open gibson042 opened 4 years ago
Please do not repeat the regex match object pattern of adding properties to an array :-)
No, it would be an object. :sweat_smile:
I'm concerned about the use of insignificant things such as whitespace, key ordering, etc to smuggle additional information. See my comments from February.
Such a channel already exists in the form of confusable glyphs/invisible whitespace/excess numeric precision/etc., not to mention extra object members and/or array items (and key ordering in particular is unaffected by this proposal; revivers are already called in source text order). That said, it's true that the bandwidth for such information would be dramatically increased. But is it really appropriate to try protecting against the extremely narrow scenarios in which a potentially malicious reviver is used with JSON.parse (e.g., that Eve doesn't have the ability to perform arbitrary inspection but does have unfettered access to deserialized values)? The fix for that would be straightforward—don't pass through source
/input
/index
:
-JSON.parse(src, untrusted)
+JSON.parse(src, function scrubber(key, val, { keys }) { return call(untrusted, this, key, val, { keys }) })
@gibson042 I think you misunderstood my concern. I am not worried about an adversarial scenario. I am worried about well-meaning developers deliberately (ab)using these insignificant encoding differences to transmit additional, extra-JSON information. I would prefer that JSON.parse only be able to act on the significant parts of the input.
This proposal necessarily introduces the ability to differentiate between input like 4
vs. 4.0
and "𝌆"
vs. "\uD834\uDF06"
, which already seems to trip your concern. But beyond that, it currently also exposes whitespace at the non-leaf level (e.g., JSON.parse("0\t", (key, val, {source}) => source) === "0\t"
). It would be possible, however, to mitigate that by only providing source text for primitives, reducing the available rope but also the internal consistency. How strong would you say your concerns are?
I think that strikes a good balance. We don't have to eliminate the possibility, we just have to limit it enough to make it unappealing.
To clarify, is that support for a) exposing source text with all values but input and index for none, or b) taking the further step of denying the ability to read source text for objects and arrays?
I think the only loss of (significant) information in JavaScript's interpretation of JSON is in the numbers/strings, right? In which case B would be preferable.
There was consensus on TC39 Incubator call to expose source text only for primitive values, and in particular to not provide input
and index
, unless a significant use case is identified.
index
(actually, line number and position) would be very relevant if I use JSON.parse
for deserialising values that are encoded as a custom scheme, where the decoding encounters a parse error. This would allow the parser to throw an appropriate human-readable error message, aiding in finding the malformed part of the source file. Not sure if that would count as a significant use case?
Examples:
// parsing JSON-LD
JSON.parse(text, (key, val, context) => {
if (key == "@type" || key == "@context") {
try {
return new URL(val);
} catch(err) {
throw Obect.assign(new SyntaxError(err.message + ` at line ${context.line}, character ${context.position}`), err);
}
} else {
return val;
}
});
// parsing abstract syntax tree
JSON.parse(text, (key, val, context) {
if (typeof val == "object" && val.type == "TreeNode") {
if (!Array.isArray(val.children))
throw new SyntaxError(`Encountered TreeNode without children at line ${context.line}, character ${context.position}`);
return new TreeNode(val.children);
}
return val;
});
Agree that "throwing indicative errors" is a strong use-case for exposing position information. Especially with other data types in the pipeline (Decimal
, etc.), establishing a generalized serialization pattern with indicative errors would be a strong base for growing functionality.
String.prototype.replace
passes position and input arguments to replacer functions and the return value fromRegExp.prototype.exec
has "index" and "input" properties;JSON.parse
could behave similarly.As noted by @rbuckton, this could be useful for error reporting.