serde-rs / json

Strongly typed JSON library for Rust
Apache License 2.0
4.82k stars 552 forks source link

Conform to I-JSON (RFC 7493) by default #398

Closed sanmai-NL closed 5 months ago

sanmai-NL commented 6 years ago

I-JSON (RFC 7493) is a stricter subset of the JSON RFC 8259. I expect that this crate already conforms to RFC 7493. In order to build a reliable REST API, I need to be able to tell consumers what kind of JSON they can expect. Hopefully it’s sensible JSON too.

clarfonthey commented 6 years ago

Looking at the RFC:

This would be a breaking change as 64-bit integers over 54 bits long would have to be encoded as strings, which leads to ambiguity.

oli-obk commented 6 years ago

Why should the default be I-JSON? (Note I do not know anything about I-JSON at all).

sanmai-NL commented 6 years ago

@oli-obk: Tim Bray:

If you want to use an RFC as foun­da­tion for a REST API or some oth­er In­ter­net pro­to­col, I ac­tu­al­ly don’t rec­om­mend 8259, I rec­om­mend I-JSON, RFC 7493, which de­scribes ex­act­ly the same syn­tax as all the oth­er specs (by ref­er­enc­ing 7159), but ex­plic­it­ly rules out some legal-but-dumbass things you could do that might break your pro­to­col, for ex­am­ple us­ing any­thing but UTF-8 or hav­ing du­pli­cate mem­ber names in your ob­ject­s.

@clarcharr:

Serde already requires that strings be UTF-8.

Note that this comment can be read in more than one way. If you mean to say that adopting I-JSON for this reason isn’t useful, then I disagree. Clients of a REST API, for example, do not know/care about the implementation details of the backend, such as that it uses Serde.

The purpose of this proposal is to allow programmers to guarantee certain basic (sensible) properties of the serialized JSON to e.g. REST API consumers.

  • Numbers are allowed arbitrary precision, which goes against the RFC's requirements.

This would be a breaking change as 64-bit integers over 54 bits long would have to be encoded as strings, which leads to ambiguity.

Okay, that’s the only issue I think requires some more thought. How Serde deserializes numbers has my interest already. @bluss has digged into that in bluss/petgraph#190. Anyway, even if Rust has a few wider integer types than ECMAScript can represent, just accepting any sequence of digits in JSON deserialization can’t be right either. What happens if deserialization induces a Rust integer overflow? It should be possible to deserialize numeric strings into an numeric type of a known size, given a numeric field of known type.

oli-obk commented 6 years ago

What happens if deserialization induces a Rust integer overflow?

let u: u64 = serde_json::from_str("999999999999999999999999").unwrap();

ErrorImpl { code: Message("invalid type: floating point 1000000000000000000000000, expected u64"), line: 1, column: 24 }

The error message for u64 could use improvement. Smaller integers produce

ErrorImpl { code: Message("invalid value: integer 999, expected u8"), line: 1, column: 3 }

It should be possible to deserialize numeric strings into an numeric type of a known size, given a numeric field of known type.

That's certainly doable, but there are probably loads of cases where you explicitly do not want to accept strings with numbers. It would also not work for enums whose variants are decided on the json type instead of having an explicit variant.

I can totally see the use case for having multiple standards, but these seem like issues that only crop in specific APIs with communication partners requiring the 7493. The only issue that can occur is that either side gives an error when deserializing due to the values being out of range. It's trivially possible to create a U64With54Bits type that acts as if it has only 54 bits or to just use a u64 and add a #[serde(deserialize_with(u64_54))] attribute.

sanmai-NL commented 6 years ago

@oli-obk: One possible resolution, then, is to claim partial conformance to I-JSON (RFC 7493), except for some aspect of integer serialization, which can be conformed to optionally with the two solutions you point out. That’s fine to me, because at least the text encoding issue and other issues covered by RFC 7493 would be out of the way then.

But note that, ECMAScript, AFAICT the primary consumer/driver of JSON as data format on the web, does not handle u64 values as deserialized by Serde well at all. So it’s an important issue to address I suppose.

mehcode commented 6 years ago

I would actually love an easy way to serialize u64 as strings for JavaScript consumers (that and easy deserialization of strings into integers from JavaScript producers).

saethlin commented 5 years ago

+1 for an easy way to tell serde that a field should be deserialized then parsed into a numeric type if it's a string. It's currently not that hard to hack around this via a deserialize_with, but an unfortunate amount of boilerplate is required. (Recently wrote such a deserialize_with myself)

chris-morgan commented 5 years ago

This crate does not conform to I-JSON, and there’s a major breaking change here:

Objects in I-JSON messages MUST NOT have members with duplicate names.

That is, the following should panic, rather than printing {"foo":3}.

let x: serde_json::Value = serde_json::from_str(r#"{"foo":1,"foo":3}"#).unwrap();
println!("{}", x);

This is probably a big deal, and you probably need the ability to support both. Some users won’t want that to be an error, whereas others will insist on it being an error. I myself am implementing JMAP, which stipulates that I-JSON MUST be used.

(On the other major part of I-JSON: JMAP also stipulates that any integers must be in the safe range, −2⁵³+1 to 2⁵³−1. I therefore also want the parser to return an error if it encounters an out-of-range number.)

dtolnay commented 5 months ago

I would prefer if this would be handled by a different library.

It might be possible to implement by wrapping serde_json's Serializer and Deserializer in an adapter that adds additional enforcement, otherwise by writing a whole new Serializer and Deserializer specific to I-JSON.