jplatte / serde_html_form

Rust crate for (de-)serialization of `application/x-www-form-urlencoded` data.
MIT License
23 stars 3 forks source link

What to do with indexmap dependency #1

Open jplatte opened 2 years ago

jplatte commented 2 years ago

Currently, the internal buffering is done using an IndexMap, such that deserializing to a Vec of key-value pairs preserves the order of the input kv pairs. For many use cases, this is not required so I'd like to make it an optional dependency.

I'm rather hesitant change default behavior (as opposed to available features) through Cargo features, though I guess I already did this with making ryu optional (it formats floats in a nicer way).

Not quite sure what to do here. I guess if indexmap stays a mandatory dependency, ryu should be too because it's the way smaller one.

jplatte commented 2 years ago

There is another related concern in that I think serde_urlencoded supports deserializing foo=a&bar=b&foo=c to vec![("foo", "a"), ("bar", "b"), ("foo", "b")]. With the current internal buffering we basically try to deserialize from [("foo", ValOrVec::Vec(["a", "c"])), ("bar", ValOrVec::Val("b"))] and so deserialization should fail.

I guess I should add a test and think these things through once again. Plus (re-)investigate what serde_urlencoded does to the above query string when deserializing to a struct.

thomaseizinger commented 1 year ago

Not quite sure what to do here. I guess if indexmap stays a mandatory dependency, ryu should be too because it's the way smaller one.

You could add a Config parameter to deserialization and expose certain configuration options only when specific features are activated. Then you have a purely additive API. The default behaviour can't depend on the feature then though.

jplatte commented 1 year ago

I think I know how to handle this now: Instead of grouping entries eagerly, we should only scan and buffer the rest of the input and collect values for a given key when visit_seq is used. Done that way, serde_html_form should have no performance penalty over serde_urlencoded and the ignored test case I added in 6e548637338cb9c1c49db82e85528d32d2c7c70c should start working.

edit: The one issue with that is DOS potential. An large input with that's deserialized to something like BTreeMap<String, Vec<String>> would lead to intial buffering, then loads of removing at the front plus arbitrary other attacker-controller positions in the buffer, which can lead to excessive memory moves.

jplatte commented 1 year ago

So I just improved the situation a bunch in b671880807c062fdc238669c48dfc3a64c9ae4d8, and I'm now thinking if it's just for performance / fewer dependencies, it might not be worth removing indexmap. In any case, supporting the test case I showed above makes it less important to invest in this, for me. One thing that looks like it broke is sequences of tuples where the second element is a Vec, but I'm not entirely sure whether that should be supported.