serde-rs / json

Strongly typed JSON library for Rust
Apache License 2.0
4.86k stars 555 forks source link

Ability to step past errors in StreamDeserializer if input is valid JSON #70

Open dtolnay opened 8 years ago

dtolnay commented 8 years ago

The current behavior of StreamDeserializer is that if it hits an error, that same error is returned in perpetuity which is typically not what you want. (If that is what you want, it is easy to implement that behavior atop what I propose below.)

Suppose we have a stream deserializer that expects to read a stream of Strings. If it receives the input "a" {"b": 0} "c", a reasonable thing to expect as output would be "a" (error) "c" because the second JSON object failed to deserialize to a String.

On the other hand, if the input is not a valid JSON stream I believe the StreamDeserializer should retain its current behavior because there is no way for it to regain its bearings in an invalid JSON stream. For example if the input is "a" {"b{{"c" "d", I would expect the output to be "a" (error) (error) (error...).

dtolnay commented 7 years ago

The desired behavior:

let data = b"[0] {"k":"v"} [1]";

let de = serde_json::Deserializer::from_slice(data);
let mut stream = de.into_iter::<Vec<i32>>();
assert_eq!(0, stream.byte_offset());

println!("{:?}", stream.next()); // [0]
assert_eq!(3, stream.byte_offset());

println!("{:?}", stream.next()); // type error
assert_eq!(13, stream.byte_offset());

println!("{:?}", stream.next()); // [1]
assert_eq!(17, stream.byte_offset());
dbeckwith commented 2 years ago

Has there been any progress towards implementing the change mentioned in the StreamDeserializer docs?

Note: In the future this method may be changed to return the number of bytes so far deserialized into a successful T or syntactically valid JSON skipped over due to a type error. See serde-rs/json#70 for an example illustrating this.

This is the behavior I was expecting from StreamDeserializer::byte_offset; that it would represent the total number of bytes processed so far, regardless of the result those bytes produced. Although it's not clear to me what the behavior should be for syntactically invalid JSON.

dbeckwith commented 2 years ago

While this could still be useful to implement, I'll describe my workaround as well which may be a better solution. Rather than using the StreamDeserializer to produce values of T directly, so it's doing both parsing and deserialization, I'm just getting Values from the StreamDeserializer so all it's doing is parsing JSON and will always advance if it finds valid JSON. Then later I'm using from_value to convert the JSON into my domain types.

Arnavion commented 2 years ago

An alternative is to move the T from StreamDeserializer<'_, R, T> down to fn next(&mut self), ie fn next<T>(&mut self) -> Option<Result<T>> so that each individual call to .next() can generate a different type.

This way the user's loop can be to first try de.next::<Vec<i32>>(), and when that fails with Some(Err(...)), they can retry with de.next::<IgnoredAny>(), then go back to de.next::<Vec<i32>>()

This gives the user precise control of how they want to handle the intermediate error. But of course the con is that this means removing the Iterator impl, and thus losing all the combinators and such that users can use right now.

balazsdukai commented 1 year ago

If someone is looking for a workaround, the solution from @H2CO3 on the Rust users forum helped me. It steps past an item in the stream even if it is invalid JSON.