sanic-org / sanic-routing

Internal handler routing for Sanic beginning with v21.3.
https://sanicframework.org/
MIT License
13 stars 11 forks source link

path_to_parts parsing bug #39

Closed kserhii closed 3 years ago

kserhii commented 3 years ago

Describe the bug

In my project, I have an endpoint

@api.route(
    r'/image/iiif/'
    r'<image_id>/'
    r'<region:full|square|\d+,\d+,\d+,\d+>/'
    r'<size:max|\d+,|,\d+|\d+,\d+>/'
    r'<rotation:int>/'
    r'default.jpg')

It works for Sanic==20.12.1, but after migration to Sanic==21.3.4 it's not a valid router anymore.

parts = ('api', 'image', 'iiif', '<image_id>/<region:full|square|\\d+,\\d+,\\d+,\\d+>', '<size:max|\\d+,|,\\d+|\\d+,\\d+>', '<rotation:int>', ...)
delimiter = '/'

    def parts_to_path(parts, delimiter="/"):
        path = []
        for part in parts:
            if part.startswith("<"):
                try:
                    match = REGEX_PARAM_NAME.match(part)
                    param_type = ""
>                   if match.group(2):
E                   AttributeError: 'NoneType' object has no attribute 'group'

../venv/lib/python3.8/site-packages/sanic_routing/utils.py:76: AttributeError

How to reproduce

>>> from sanic_routing.utils import path_to_parts
>>> s = (
...     r'/image/iiif/'
...     r'<image_id>/'
...     r'<region:full|square|\d+,\d+,\d+,\d+>/'
...     r'<size:max|\d+,|,\d+|\d+,\d+>/'
...     r'<rotation:int>/'
...     r'default.jpg')
>> path_to_parts(s)
('image', 'iiif', '<image_id>/<region:full|square|\\d+,\\d+,\\d+,\\d+>', '<size:max|\\d+,|,\\d+|\\d+,\\d+>', '<rotation:int>', 'default.jpg')

Expected behavior

>> path_to_parts(s)
('image', 'iiif', '<image_id>', '<region:full|square|\\d+,\\d+,\\d+,\\d+>', '<size:max|\\d+,|,\\d+|\\d+,\\d+>', '<rotation:int>', 'default.jpg')

Environment

ahopkins commented 3 years ago

Hmm, I think it might be worth it to scrap the regex splitting in this function. I have tested it quite a bit, but it does seem like this is a use case that breaks it. Perhaps a solution will be to run a regular .split("/") operation, and then loop through the segments looking for instances where there is an open < but the close > is in the next segment, and merge.

ahopkins commented 3 years ago
def _normalize(parts, delimiter):
    part_iter = iter(parts)
    normalized = []
    for part in part_iter:
        if part.startswith("<") and not part.endswith(">"):
            ending = ""
            try:
                while not ending.endswith(">"):
                    ending += delimiter + next(part_iter)
            except StopIteration:
                raise Exception("invalid")

            part += ending
        normalized.append(part)
    return normalized

def path_to_parts(path, delimiter="/"):
    parts = _normalize(path.split(delimiter), delimiter)
    return tuple([part if part.startswith("<") else quote(part) for part in parts])

I just threw together this little snippet to prove my point. I think this is probably the way to go and should be more accurate in catching corner cases like this one.


/image/iiif/<image_id>/<region:full|square|\d+,\d+,\d+,\d+>/<size:max|\d+,|,\d+|\d+,\d+>/<rotation:int>/default.jpg
('', 'image', 'iiif', '<image_id>', '<region:full|square|\\d+,\\d+,\\d+,\\d+>', '<size:max|\\d+,|,\\d+|\\d+,\\d+>', '<rotation:int>', 'default.jpg')
kserhii commented 3 years ago

Can you maybe provide a path for which normal .split("/") does not work? Just to have a clear understanding of the problem.

ahopkins commented 3 years ago

Can you maybe provide a path for which normal .split("/") does not work? Just to have a clear understanding of the problem.

@app.get(r"/path/to/<deeply_nested:[a-z/]+>/thing")
async def handler(request, deeply_nested):
    return text(deeply_nested)
$ curl localhost:9999/path/to/location/with/slashes/thing
location/with/slashes

There are three use cases:

  1. Custom regex routes that have a / in them
  2. a <path:path> param (which uses item 1)
  3. static files (which uses item 2)

Given this, that is the reason for the complication with my proposed solution. Using .split() will unintentionally break apart some of these segments, so we will need to stitch them back. I thought the regex covered all scenarios, but I think a "dumber" solution might just be simpler.

kserhii commented 3 years ago

I've looked a bit and found the solution her: https://stackoverflow.com/questions/732029/how-to-split-string-by-unless-is-within-brackets-using-regex

Here is the pull request with extra tests

ahopkins commented 3 years ago

Fixed by #40