marshmallow-code / webargs

A friendly library for parsing HTTP request arguments, with built-in support for popular web frameworks, including Flask, Django, Bottle, Tornado, Pyramid, webapp2, Falcon, and aiohttp.
https://webargs.readthedocs.io/
MIT License
1.38k stars 158 forks source link

Are key values collected from headers case insensitive? #549

Open Josh-Marshall-FactSet opened 4 years ago

Josh-Marshall-FactSet commented 4 years ago

The documentation on passed values (https://flask-smorest.readthedocs.io/en/latest/arguments.html) for form values, json, headers, query, etc. do not have case sensitivity documented. I would expect locations like json would be case sensitive, but locations like headers to be case insensitive. Looking at https://github.com/marshmallow-code/flask-smorest/blob/e06325d7a201af6d07caa2e8e103fba15eb4b9b7/flask_smorest/arguments.py#L105 I'd guess everything is case sensitive.

lafrech commented 4 years ago

Indeed, case sensitive everywhere.

Transfering to apispec, as this is where arguments are parsed.

This might be challenging to address. It all happens in the marshmallow schema.

Perhaps when dealing with headers, one could

I don't see an easy and nice way to do that.

Josh-Marshall-FactSet commented 4 years ago

It is an important use case for me. Trying to tame some legacy stuff, and it could be going more smoothly. Another thing which I'm not sure is relevant is allowing args to come from different location, like query and json.

lafrech commented 4 years ago

My bad. Got apispec and webargs mixed up.

lafrech commented 4 years ago

My bad. Got apispec and webargs mixed up.

lafrech commented 4 years ago

Sorry about the transfer mess.

The point about query or json is addressed in the docs: https://webargs.readthedocs.io/en/latest/advanced.html#meta-locations.

sirosen commented 3 years ago

Because webargs is re-exposing framework data, this seems like a framework-specific detail. Some frameworks like Django expose a case-insensitive dict, and some do not.

Do we want to treat this as a doc issue, or do we actually want to wrap headers in our own case-insensitive dict implementation? I'm somewhat averse to wrapping the data just because I like keeping webargs small where we can.

@lafrech, are you considering such a wrapper in flask-smorest ? If you are, that might change my attitude here -- if you're going to write that code anyway, maybe it's easy enough to just bake it into webargs.

sirosen commented 3 years ago

@lafrech, as I look at this, might this be something we could solve with #583 ?

flask-smorest could subclass FlaskParser to provide a pre_load method which checks for location="header" and lowercases the keys by default? Then this becomes a feature of flask-smorest which is supported by webargs. What do you think?

lafrech commented 3 years ago

Indeed we could add a pre_load method to do that.

Since HTTP headers are case-insensitive everywhere, not just in OpenAPI spec, this might be better suited here than in flask-smorest.

I don't think anyone would mind having it here.

Besides if it is a parser method, someone who really doesn't want that can always override it with a noop. We can even add an opt-out config attribute.

sirosen commented 3 years ago

I haven't done the 8.0 release in case we want to include a change related to this; but I've been swamped and haven't had time to think about it much.

The fact that a header can be provided multiple times (so headers are a multidict) could make this messy. We might need our own multidict implementation if we wanted to support that.

I think the easiest thing to do would be to release 8.0 with the current pre_load support, and, like stripping whitespace from values, create a documentation example for this.

# in advanced docs...

# warning: this will not preserve headers which are provided multiple times!
def _lowercase_keys(value):
    return {k.lower(): v for k, v in value.items()}

class LowercaseHeadersParser(FlaskParser):
    def pre_load(self, location_data, *, schema, req, location):
        if location == "header":
            return _lowercase_keys(location_data)
        return location_data

Does that sound good?

I could contribute something like the above to flask-smorest if that impacts the decision making here. I'm more comfortable doing it there for a couple of reasons: there's only one framework to support and we haven't done v1.0 there yet.

lafrech commented 3 years ago

No rush for the release. I could do it as well, but like you, I didn't get the time.

If this reveals more complicated than I thought, we can postpone it.

It would be nice to come up with an implementation that doesn't lose the multiple values.

I thought we'd just mutate the dict, but it looks like the headers multidict is immutable, so we need to create a new one.

IIUC, creating a new multidict is trickier because it depends on the initial multidict class (which depends on the web framework).

I'm not familiar with MultiDicts and I only use Flask so my vision is rather limited.

Josh-Marshall-FactSet commented 3 years ago

Would this be relevant and helpful for a related group's solution to the problem? https://requests.readthedocs.io/en/master/_modules/requests/structures/

lafrech commented 3 years ago

Looks like it is not a MultiDict. It is case insensitive, but not multi.

Still, it is indeed a good idea to check how requests or other libs deal with this, if they do.

Also relevant : https://stackoverflow.com/a/4371395

Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one "field-name: field-value" pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma. The order in which header fields with the same field-name are received is therefore significant to the interpretation of the combined field value, and thus a proxy MUST NOT change the order of these field values when a message is forwarded

So multiple & case insensitive should be supported. And order preserved.

sirosen commented 3 years ago

This thread is reminding me of a past issue in which a user had a schema pre_load hook which modified data. Because the underlying multi-dict was immutable, the user had to run data = dict(data) as the first step in the hook.

One direction we could go is to apply that more generally and "unwrap" MultiDictProxy into other types. dict(MultiDictProxy(...)) works. If we construct our own type, CaseInsensitiveDict, then we could do CaseInsensitiveDict(MultiDictProxy(...)) for headers. (And, logical next step: I'd replace MultiDictProxy with a function which returns a dict or CaseInsensitiveDict.)

The place that runs into trouble is how we detect multi-fields. Because it would put case insensitivity "around" MultiDictProxy, the inner workings of MultiDictProxy don't get case insensitivity.

But, following from that thought, maybe there's a way we can get all the way to providing a dict to user schemas always. Like dict(CaseInsensitiveMultiDictProxy(...))? I think that would be pretty nice if we can manage it.

sirosen commented 3 years ago

I've been thinking about this on and off for a few weeks, and I think we should ship 8.0 without doing anything on this front.

Today, if a framework provides headers as case-insensitive, then the proxy object works and behaves case-insensitively. If a framework provides headers as case sensitive, then the proxy object works and behaves case-sensitively. That seems pretty reasonable -- you get behavior based on what your framework provides.

I started looking at what the frameworks give us this morning, to see how much variation there is on this front. To save time I just looked at flask/werkzeug, pyramid/webob, and django. Django and Pyramid/webob both implement headers as case-insensitive non-multidict types. When I opened up werkzeug, it looks to me like flask/werkzeug is already case insensitive.

So the answer to the OP's question is twofold: (1) it depends on your framework and (2) if you're using flask/flask-smorest, then yes, headers are case-insensitive.

@lafrech, if this sounds good, let me know, and I'll do the 8.0 release. Since we've now let it sit for a bit, I want to re-confirm.

lafrech commented 3 years ago

Thanks for this research.

You may ship 8.0.

No need to wait for https://github.com/marshmallow-code/marshmallow/pull/1742. We should be able to support that change without dropping support for old names. It's only tests and docstring changes.