lovasoa / bad_json_parsers

Exposing problems in json parsers of several programming languages.
MIT License
366 stars 24 forks source link

serde_json can disable the recursion limit #7

Open erickt opened 5 years ago

erickt commented 5 years ago

FYI, serde_json has a recursion limit to protect from malicious clients sending a deeply recursive structure and DOS-ing a server. However, it does have an optional feature unbounded_depth that disables this protection. It still uses the stack though, so it will still eventually blow the stack instead of using all available memory.

NilSet commented 5 years ago

The stack is also configurable and there is a crate which dynamically grows the stack when in danger of overflowing the current stack of the deserializer. Here is an example from serde-json's documentation.

 /// Parse arbitrarily deep JSON structures without any consideration for
/// overflowing the stack.
///
/// You will want to provide some other way to protect against stack
/// overflows, such as by wrapping your Deserializer in the dynamically
/// growing stack adapter provided by the serde_stacker crate. Additionally
/// you will need to be careful around other recursive operations on the
/// parsed result which may overflow the stack after deserialization has
/// completed, including, but not limited to, Display and Debug and Drop
/// impls.
///
/// *This method is only available if serde_json is built with the
/// `"unbounded_depth"` feature.*
///
/// # Examples
///
/// ```edition2018
/// use serde::Deserialize;
/// use serde_json::Value;
///
/// fn main() {
///     let mut json = String::new();
///     for _ in 0..10000 {
///         json = format!("[{}]", json);
///     }
///
///     let mut deserializer = serde_json::Deserializer::from_str(&json);
///     deserializer.disable_recursion_limit();
///     let deserializer = serde_stacker::Deserializer::new(&mut deserializer);
///     let value = Value::deserialize(deserializer).unwrap();
///
///     carefully_drop_nested_arrays(value);
/// }
///
/// fn carefully_drop_nested_arrays(value: Value) {
///     let mut stack = vec![value];
///     while let Some(value) = stack.pop() {
///         if let Value::Array(array) = value {
///             stack.extend(array);
///         }
///     }
/// }
/// ```
lovasoa commented 5 years ago

I added the following to the README to make it clearer that the limit is a feature implemented intentionally in the parsers with the lowest maximal nesting :

Some recursive parser libraries implement a safety check in order to avoid crashing the calling program: they artificially limit the maximum depth they accept (sometimes making that limit configurable), hoping that the size of the stack at the moment they are called plus the artificial limit will always be smaller than the total stack size. This limit is an arbitrary choice of the library implementer, and it explains all the lower values of the comparison you'll see below.

ssokolow commented 5 years ago

I still think that either the serde_json line should gain a note about disable_recursion_limit or C#, PHP, and Python 3 should lose their notes about the configurable limits.

Personally, my recommendation for making this as useful as possible, would be:

  1. Clearly indicate whether each parser is limited by byte size or recursion depth.
  2. Clearly indicate whether the failure results in a safe error result (eg. throwing an exception) or some kind of irrecoverable terminal condition (eg. segfault).
  3. Test both with and without the parser's built-in recursion or length limit, if it's possible to disable it.
  4. Add some kind of indication (eg. *) to the distinguish between parsers where the limit can be fully customized and parsers where it's just a choice between "off" and "on".
  5. If such parsers exist, add some kind of indication (eg. ) to indicate parsers where the limit can be reconfigured but not turned off, so testing was accomplished by setting a limit high enough that something else interfered with parsing.
  6. Document how difficult it is to increase the stack size in each language and whether things like Python's sys.setrecursionlimit have unlimited capability or are, themselves, safety limits that will run up against the size of the C stack of set too high.
  7. Note (eg. with ) if the parser supports a lower-level SAX-like API that allows streaming JSON parsing in a fixed amount of memory at the cost of you having to provide your own model.

That would complete the process of reinterpreting the original "bad JSON parsers" to mean whatever "bad choice for your use case" means to each reader. (eg. Are you worried about DoS in a server with high load? Then "bad" means things without a limit, which segfault when they hit their limit, or with a limit so high that they'll cause the program's memory use to balloon up until.)

Heck, if you really wanted to do a good job on that front, the ideal approach would be to run the test under something like Valgrind which can gather peak memory consumption stats, run another test that parses a minimal JSON input like [], and then subtract one from the other to provide a column showing how memory-efficient the parser and the parsed representation of the document are.

(I'd offer to at least get a head start on it, but I'm still working to find time to contribute some more patches to the dialog crate for Rust to add more of the dialog types supported by tools like Zenity and KDialog.)

lovasoa commented 5 years ago

Clearly indicate whether each parser is limited by byte size or recursion depth.

Are you worried about DoS in a server with high load? Then "bad" means things without a limit, which segfault when they hit their limit, or with a limit so high that they'll cause the program's memory use to balloon up until.)

Note (eg. with ‡ ) if the parser supports a lower-level SAX -like API that allows streaming JSON parsing in a fixed amount of memory at the cost of you having to provide your own model.

I thought my addition would clear the misunderstanding but it didn't. Once again, memory limits are not the problem here. All the parsers presented here use an amount of memory that is proportional to the input. None cause the program's memory usage to balloon up on deeply nested inputs. If you are worried about denial of service to a server, parsers without a depth limit are not worse in any way. They don't cause an abnormal memory usage, because memory usage is dominated by the size of the input, not it's depth.

For the other points, I agree they would be useful additions!

ssokolow commented 5 years ago

All the parsers presented here use an amount of memory that is proportional to the input. None cause the program's memory usage to balloon up on deeply nested inputs. If you are worried about denial of service to a server, parsers without a depth limit are not worse in any way. They don't cause an abnormal memory usage, because memory usage is dominated by the size of the input, not it's depth.

I understood you. I just wasn't properly clear. I said that as an example of someone whose definition of "sorted from bad to good" would most likely be the inverse of the original framing of the sort order used on the table.

alexxroche commented 4 years ago

For those that also end up here wanting to disable the recursion limit because of:

no method named `disable_recursion_limit` found for struct `serde_json::Deserializer<serde_json::de::StrRead<'_>>` in the current scope

Cargo.toml

[dependencies]
serde_json = { version = "1.0", features = ["unbounded_depth"] }

Solved this for me. (Though N.B. you MUST pre-process any json that could be tampered with accidentally or deliberately!) [Solution found in: https://github.com/dtolnay/serde-stacker/issues/3]