ocaml-community / yojson

Low-level JSON parsing and pretty-printing library for OCaml
https://ocaml-community.github.io/yojson/
BSD 3-Clause "New" or "Revised" License
325 stars 58 forks source link

Duplicate keys, a source of security vulnerabilities,are accepted by Yojson #161

Open mbacarella opened 1 year ago

mbacarella commented 1 year ago

I know some work was done on JSON compliance here https://github.com/ocaml-community/yojson/issues/34 but could we consider taking a stronger position?

I would expect duplicate keys to raise but instead yojson silently accepts them. This can lead to inconsistent rules enforcement behavior, especially when passing messages between systems with different JSON implementations.

let j = Yojson.Safe.from_string "{ \"qty\":1, \"qty\":-1 }" ;;
val j : Yojson.Safe.t = `Assoc [("qty", `Int 1); ("qty", `Int (-1))]

An exposition is here An Exploration of JSON Interoperability Vulnerabilities

Leonidas-from-XIV commented 1 year ago

I was about to say that JSON allows duplicate keys but looks like both RFC 7159 and RFC 8259 state

The names within an object SHOULD be unique.

So I think it would make sense to check for duplicate keys and fail in that case. That probably would require a major version bump as it might be something that Yojson users use today, but I don't expect it to be a big problem for 99% of users.

I don't have much time to implement it but I'd happily review and merge a PR.

mbacarella commented 1 year ago

I offer to do it though, one question: users may resent the small performance penalty this imposes without some kind of flag to undo it, aside from the semantics concern. Should there be a way to escape the check?

c-cube commented 1 year ago

As far as I understand, SHOULD is not MUST. It's better to produce json objects without duplicates, but I don't think that it's invalid.

mbacarella commented 1 year ago

Given that we're upstanding proper OCaml people I'd consider the SHOULDs as MUSTs. Doubly so because of the security issues this inconsistent enforcement introduces (see link above).

c-cube commented 1 year ago

Ah, that's fair. If you want to go a bit further in bringing sanity to JSON, and you want reasonable APIs, I'd say that objects could use Map.Make(String) :grin: . After all objects are described as unordered, so relying on the order of bindings is wrong.

Leonidas-from-XIV commented 1 year ago

@c-cube In theory yes, but it is possible that the Yojson representation was decided before the rule about no duplicate keys was formalized in RFCs (remember JSON.org and its underspecified details?) and changing it now would break so much code, I don't even want to know how many PRs I would need to submit to fix it. The horror!

@mbacarella I am not sure an escape hatch wouldn't make the performance even worse, since you'd either need to check all the time or use CPPO to generate yet another variation of the module. Maybe worth benchmarking how much performance impact we're talking here if `Assoc were to track the encountered keys in e.g. a set.

Heyji2 commented 4 months ago

Reading the thread, I understand there is no decision taken on this issue. At most, you would like to evaluate the performance penalty of checking whether a new key is a duplicate or not. I assume the "performance" pernalty is to be understood in time and memory. My opinion is that one should not break previous behaviour, especially if RFCs do not requires it. Having the opportunity to check for duplicate keys should be an option that users should be able to enable on demand by reading the doc. The check should only mention the issue in an warning message. It is the first step one should implement to move forward with the following benefits :

Leonidas-from-XIV commented 4 months ago

One could implement it as new module Safer or something, but the issue with the proliferation of these parsing variants is that they don't get adoption and changing them later is hard.

So one could have a default option to from_string etc like ?(allow_duplicate_keys=true) that could be included in Yojson 2, inverted in Yojson 3 and removed in Yojson 4.

Heyji2 commented 4 months ago

I would argue that if they don't get adoption, it may be because there is no need for such feature...

I don't understand "and changing them later is hard". Changing what exactly, the Safer module ? You mean it is hard to change a module that did not get adoption ?

Leonidas-from-XIV commented 4 months ago

I would argue that if they don't get adoption, it may be because there is no need for such feature...

I don't think that is the case. People often use something without discussing all the minutiae of their decision, just because e.g. there was a code sample that did what they wanted without spending time analyzing that another approach would've been better for their usecase. I don't like blaming people for doing something unsafe if the safe thing was harder to do. We as programmers should strive to make APIs sensible as possible.

Case in point: who knows that besides Yojson.Basic and Yojson.Safe there is also Yojson.Raw? Is picking Raw a better choice in some cases? Maybe, but it is so obscure as to be nearly irrelevant.

I don't understand "and changing them later is hard".

So if we add a Safer module, we'd need to convince people do migrate their codebases (and all the code they depend on) to the new module. If we add a default option to the parse functions it is just a single parameter that we can easily change between releases if it makes sense.