Open mbacarella opened 1 year ago
I was about to say that JSON allows duplicate keys but looks like both RFC 7159 and RFC 8259 state
The names within an object SHOULD be unique.
So I think it would make sense to check for duplicate keys and fail in that case. That probably would require a major version bump as it might be something that Yojson users use today, but I don't expect it to be a big problem for 99% of users.
I don't have much time to implement it but I'd happily review and merge a PR.
I offer to do it though, one question: users may resent the small performance penalty this imposes without some kind of flag to undo it, aside from the semantics concern. Should there be a way to escape the check?
As far as I understand, SHOULD
is not MUST
. It's better to produce json objects without duplicates, but I don't think that it's invalid.
Given that we're upstanding proper OCaml people I'd consider the SHOULDs as MUSTs. Doubly so because of the security issues this inconsistent enforcement introduces (see link above).
Ah, that's fair. If you want to go a bit further in bringing sanity to JSON, and you want reasonable APIs, I'd say that objects could use Map.Make(String)
:grin: . After all objects are described as unordered, so relying on the order of bindings is wrong.
@c-cube In theory yes, but it is possible that the Yojson representation was decided before the rule about no duplicate keys was formalized in RFCs (remember JSON.org and its underspecified details?) and changing it now would break so much code, I don't even want to know how many PRs I would need to submit to fix it. The horror!
@mbacarella I am not sure an escape hatch wouldn't make the performance even worse, since you'd either need to check all the time or use CPPO to generate yet another variation of the module. Maybe worth benchmarking how much performance impact we're talking here if `Assoc
were to track the encountered keys in e.g. a set.
Reading the thread, I understand there is no decision taken on this issue. At most, you would like to evaluate the performance penalty of checking whether a new key is a duplicate or not. I assume the "performance" pernalty is to be understood in time and memory. My opinion is that one should not break previous behaviour, especially if RFCs do not requires it. Having the opportunity to check for duplicate keys should be an option that users should be able to enable on demand by reading the doc. The check should only mention the issue in an warning message. It is the first step one should implement to move forward with the following benefits :
One could implement it as new module Safer
or something, but the issue with the proliferation of these parsing variants is that they don't get adoption and changing them later is hard.
So one could have a default option to from_string
etc like ?(allow_duplicate_keys=true)
that could be included in Yojson 2, inverted in Yojson 3 and removed in Yojson 4.
I would argue that if they don't get adoption, it may be because there is no need for such feature...
I don't understand "and changing them later is hard". Changing what exactly, the Safer
module ? You mean it is hard to change a module that did not get adoption ?
I would argue that if they don't get adoption, it may be because there is no need for such feature...
I don't think that is the case. People often use something without discussing all the minutiae of their decision, just because e.g. there was a code sample that did what they wanted without spending time analyzing that another approach would've been better for their usecase. I don't like blaming people for doing something unsafe if the safe thing was harder to do. We as programmers should strive to make APIs sensible as possible.
Case in point: who knows that besides Yojson.Basic
and Yojson.Safe
there is also Yojson.Raw
? Is picking Raw
a better choice in some cases? Maybe, but it is so obscure as to be nearly irrelevant.
I don't understand "and changing them later is hard".
So if we add a Safer
module, we'd need to convince people do migrate their codebases (and all the code they depend on) to the new module. If we add a default option to the parse functions it is just a single parameter that we can easily change between releases if it makes sense.
I know some work was done on JSON compliance here https://github.com/ocaml-community/yojson/issues/34 but could we consider taking a stronger position?
I would expect duplicate keys to raise but instead
yojson
silently accepts them. This can lead to inconsistent rules enforcement behavior, especially when passing messages between systems with different JSON implementations.An exposition is here An Exploration of JSON Interoperability Vulnerabilities