plotly / dash

Data Apps & Dashboards for Python. No JavaScript Required.
https://plotly.com/dash
MIT License
21.19k stars 2.04k forks source link

Path template variables not correctly parsed with dot #2437

Open jowlo opened 1 year ago

jowlo commented 1 year ago

Bug description

Path template variables are not parsed/populated correctly when using a dot (.) as seperator between placeholders as in /dot/<foo>.<bar>/.

When using /dot/<foo>.<bar>/ as path template, a request to /dot/abc.def should result in a call to layout(foo='abc', bar='def'). Instead there is a call to layout(foo='abc.de', part=None). (Note the missing e at the end)

Using /dash/<foo>-<bar>/ as path template, the request url is handled as expected and variables are parsed/populated correctly.

Expected behavior

Variables should be parsed as expected and correctly independent of characters between seperators.

Minimal example

import dash
from dash import Dash, html, dcc

app = Dash(__name__, use_pages=True, pages_folder="")
dash.register_page(
    "Dot",
    layout=lambda foo=None, bar=None: html.H1([html.B(foo), html.I(bar)]),
    path_template='/dot/<foo>.<bar>'
)

dash.register_page(
    "Dash",
    layout=lambda foo=None, bar=None: html.H1([html.B(foo), html.I(bar)]),
    path_template='/dash/<foo>-<bar>'
)

app.layout = html.Div(
    [
        html.H1("App"),
        html.Div([
            html.Div(dcc.Link(f"{page['name']} - {page['path']}", href=page["path"]))
            for page in dash.page_registry.values()
        ]),
        html.Hr(),
        dash.page_container,
    ]
)

if __name__ == "__main__":
    app.run_server(debug=True)

Environment

dash                2.8.1
jowlo commented 1 year ago

I looked at the code and found that path_template is used as regex pattern in _parse_path_variables after replacing the <variable> by regex groups. https://github.com/plotly/dash/blob/d20161d32e406cf4379c2f92b133702e1bff7bc6/dash/_pages.py#L96-L120

The documentation does not mention the fact that the path_template is used as regex. While users are probably unlikely to use templates that are translated to vulnerable patterns, using regex under the hood without escaping potentially opens up DoS vectors.

I found that the implementation was jumping more hoops than needed. The re module already provides named capture groups for matches. I pasted my adapted code using those. It replaces the <foo> syntax with a regex group (?P<foo>.*) after escaping all regex characters with re.escape.

Note the re.escape on the (dash-)user supplied path template.

def _parse_path_variables(pathname, path_template):
    """
    creates the dict of path variables passed to the layout
    e.g. path_template= "/asset/<asset_id>"
         if pathname provided by the browser is "/assets/a100"
         returns **{"asset_id": "a100"}
    """

    pattern = re.compile(
        re.sub(
            "<(?P<var>.*?)>", r"(?P<\g<var>>.*)",
            re.escape(path_template)
        )
    )
    match = pattern.match(pathname)
    return match.groupdict() if match else None

I am happy to make this into a PR if you agree in principle.

alexcjohnson commented 1 year ago

Thanks @jowlo - nice detective work, and your proposal looks good. We would welcome a PR! Try and add a test of the specific bug you encountered with dots in the path; so far the pages tests are all integration tests (mostly in tests/integration/multi_page) but this could also be covered just by unit tests (would just be a new file in tests/unit)