pallets / werkzeug

The comprehensive WSGI web application library.
https://werkzeug.palletsprojects.com
BSD 3-Clause "New" or "Revised" License
6.66k stars 1.74k forks source link

deprecate `OrderedMultiDict` #2968

Closed davidism closed 3 weeks ago

davidism commented 3 weeks ago

OrderedMultiDict is not used by Werkzeug. It has a complex implementation, and is documented to be slow. dict maintains key insertion order in all supported versions of Python. MultiDict already maintains the insertion order of values within a key. OrderedMultiDict maintains the insertion order of each item, so [(a, 1), (b, 2), (a, 3)] comes out the same instead of [(a, 1), (a, 3), (b, 2)]. It's not clear there's a use case for this, searching sourcegraph shows some use of OrderedMultiDict but it's not clear if the use is actually required vs using MultiDict. Other libraries such as boltons also provide implementations. Mark this as deprecated and see if anything comes up.

davidism commented 3 weeks ago

https://sourcegraph.com/search?q=context:global+lang:Python+werkzeug+OrderedMultiDict+-file:venv+-file:site-packages+-file:werkzeug+-repo:pallets/werkzeug+&patternType=keyword&sm=0 shows ~24 repos using OrderedMultiDict. Browsing through them, most uses do not seem necessary.

I did see a couple cases where knowing the exact order that request.args or request.form was parsed in was important. But in that case, it's more straightforward to compare the exact data sent rather than the parsed version:

davidism commented 3 weeks ago

@ThiefMaster Indico uses it in a few places: https://github.com/search?q=repo%3Aindico%2Findico%20orderedmultidict&type=code, thoughts? It's not clear that it's needed.

ThiefMaster commented 3 weeks ago

I think parameter_storage_class = ImmutableOrderedMultiDict (on the Request subclass) is the most interesting one, because I added that one due to how crappy the PayPal webhook validation is (they require you to send back data you received in the exact same order).

IIRC there was either something in the docs on on some SO post recommending this, so there may be more people using it when they have a paypal integration in their flask app.

request.get_data() called before accessing request.form returns the exact request body body

FWIW depending on the architecture of an application (especially when plugins are involved) it may be hard to ensure that gets called before anything else accesses the form data.


The other two places where it's used are most likely leftovers from before dict was guaranteed to be ordered by insertion order, so if the standard MultiDict already does that now as well it's enough.

davidism commented 3 weeks ago

PayPal IPN was one use case I saw, seems like a weird API design. The other use case I found seemed to be trying to check that a URL was canonical based not only on the query args but their exact order in the URL.

I know it's not great to have to change it, but doing something like the following would probably be enough for the PayPal case, if an app can't guarantee that the data is cached first. The query args case could probably just look at query_string or parse_qsl(query_string) instead.

class StoredImmutableMultiDict(ImmutableMultiDict):
    def __init__(self, mapping):
        assert isinstance(mapping, list)
        self.original_items: list[tuple[str, str]] = mapping
        super().__init__(mapping)

parameter_storage_class = StoredImmutableMultiDict

Then you can access form.original_items if needed. Even seems preferable to OrderedMultiDict, which is slower and uses even more memory.

I did see that Werkzeug's documentation for parameter_storage_class mentions ImmutableOrdereMultiDict as an option, so that's probably where the advice came from.

ThiefMaster commented 3 weeks ago

FWIW I don't think paypal has any multi-value keys, so probably a normal MultiDict is enough nowadays...

davidism commented 3 weeks ago

Oh yeah, that level of ordering only matters if there are actually multiple values per key, and they're submitted with keys interleaved. Seems exceedingly unlikely.