sanic-org / sanic-ext

Extended Sanic functionality
https://sanic.dev/en/plugins/sanic-ext/getting-started.html
MIT License
50 stars 36 forks source link

Query params validation is completely broken #241

Closed 0xlakshay closed 9 months ago

0xlakshay commented 9 months ago

Query params validation is completely broken. I'm surprised this hasn't been reported yet.

Sanic uses urllib.parse.parse_qs to parse query params into a dictionary. The parsed params can be accessed using Request.args This works fine for everything but validation. But why? Because urllib.parse.parse_qs has a very convenient behaviour where it converts value of a key into a list even if only one value is present.

$ curl localhost:8000/devilman?q=crybaby
# The parsed params dict would be
{
  "devilman" : ["crybay"]
}

Now it is very easy to see why pydantic or msgpec validation would fail!

To Reproduce

# app.py
from sanic import Sanic, Request, response
from sanic.log import logger
from sanic_ext import validate

app = Sanic("devilman")

from pydantic import BaseModel

class DevimanRequestQueryParams(BaseModel):
    q: str

@app.get("/devilman")
@validate(query=DevimanRequestQueryParams)
def crybaby(request: Request):
    logger.debug(request.args)
    logger.debug(request.query_args)
    return response.json(request.args)
$ curl localhost:8000/devilman?q=crybaby

Expected behavior The request should be validated just fine. But it will throw a validation error.

Proposed FIx Without breaking old behaviour of how sanic parses query params, we can either

  1. Post-parsing using urllib.parse.parse_qs, we can manually loop and convert one value lists back to single value.
  2. Use a custom function (essentially a slightly modified parse_qs) that gives the desired dictionary.

Fix 1 can use Request.query_args which is essentially result of urllib.parse.parse_qsl which is also used in urllib.parse.parse_qs.

Fix 2, parse_qs only needs a slight modification

def parse_qs(qs, keep_blank_values=False, strict_parsing=False,
             encoding='utf-8', errors='replace', max_num_fields=None, separator='&'):
    parsed_result = {}
    pairs = parse_qsl(qs, keep_blank_values, strict_parsing,
                      encoding=encoding, errors=errors,
                      max_num_fields=max_num_fields, separator=separator)
    for name, value in pairs:
        if name in parsed_result:
            parsed_result[name].append(value)
        else:
-            parsed_result[name] = [value]
+            parsed_result[name] = value
    return parsed_result

As you can see, Fix 1 & 2 achieve expected behaviour in similar manner.

Environment (please complete the following information):

0xlakshay commented 9 months ago

Other than this maybe we can modify the behaviour of sanic.request.parameters.RequestParameters but I am not completely sure about this. I think sanic.request.parameters.RequestParameters already tries to achieve something similar but it doesn't work here while validation the query dict with models. Also should we modify its behaviour validation usecase also needs discussion.

RequestParameters ```py class RequestParameters(dict): """Hosts a dict with lists as values where get returns the first value of the list and getlist returns the whole shebang""" def get(self, name: str, default: Optional[Any] = None) -> Optional[Any]: """Return the first value, either the default or actual ... """ return super().get(name, [default])[0] <---- talking about this def getlist( self, name: str, default: Optional[Any] = None ) -> Optional[Any]: """Return the entire list ... """ return super().get(name, default) ```

Can I raise a PR with FIx 1 or 2? For now we have to resort to a custom query validator decorator which is not fun.

0xlakshay commented 9 months ago

Oops! found this https://github.com/sanic-org/sanic-ext/pull/157, much cleaner solution Seems to be working fine in latest version! closing this