simonw / simonwillisonblog

The source code behind my blog
https://simonwillison.net/
Apache License 2.0
188 stars 17 forks source link

Get &= in URLs to work through middleware #335

Closed simonw closed 1 year ago

simonw commented 1 year ago

Frustrating bug in https://elk.zone/ - this post:

https://elk.zone/fedi.simonwillison.net/@simon/110171640705827017

Currently links to this:

https://simonwillison.net/search/?q=iphone&sort=date&year=2007

Which doesn't work properly, because Elk turned the & into &.

There's a bug report about that here:

In the meantime though I could add some custom middleware to my blog that spots & in URLs and redirects to the correct place.

simonw commented 1 year ago

Prompt for GPT-4:

Write Django middleware that spots if the incoming URL looks like /search/?q=iphone&sort=date&year=2007 and if it does redirects to /search/?q=iphone&sort=date&year=2007 instead

simonw commented 1 year ago

It wrote me this: https://chat.openai.com/chat/c20ccffb-a802-4cf7-8695-c429761d2828

from urllib.parse import urlparse, parse_qsl, urlencode, urlunparse
from django.shortcuts import redirect

class SearchRedirectMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        if request.path.startswith('/search/') and '&' in request.get_full_path():
            parsed_url = urlparse(request.get_full_path())
            query_params = parse_qsl(parsed_url.query)

            # Replace & with & in the query parameters
            corrected_query = [(k, v.replace('&', '&')) for k, v in query_params]

            # Rebuild the URL with corrected query parameters
            corrected_url = urlunparse((
                parsed_url.scheme,
                parsed_url.netloc,
                parsed_url.path,
                parsed_url.params,
                urlencode(corrected_query),
                parsed_url.fragment,
            ))

            # Redirect the user to the corrected URL
            return redirect(corrected_url)

        response = self.get_response(request)
        return response

I'm going to remove the /search/ check and ship that.

simonw commented 1 year ago

That didn't quite work, but a few minutes of debugging got me to this instead:

from urllib.parse import urlparse, parse_qsl, urlencode, urlunparse
from django.shortcuts import redirect

class AmpersandRedirectMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        full_path = request.get_full_path()
        if "&" in full_path or "&amp%3B" in full_path:
            parsed_url = urlparse(full_path)
            query_params = parse_qsl(parsed_url.query)

            # Replace & with & in the query parameters
            corrected_query = [(k.replace("amp;", ""), v) for k, v in query_params]

            # Rebuild the URL with corrected query parameters
            corrected_url = urlunparse(
                (
                    parsed_url.scheme,
                    parsed_url.netloc,
                    parsed_url.path,
                    parsed_url.params,
                    urlencode(corrected_query),
                    parsed_url.fragment,
                )
            )

            # Redirect the user to the corrected URL
            return redirect(corrected_url)

        response = self.get_response(request)
        return response
simonw commented 1 year ago

That worked: https://simonwillison.net/search/?q=iphone&sort=date&year=2007 now redirects to https://simonwillison.net/search/?q=iphone&sort=date&year=2007