readthedocs / readthedocs.org

The source code that powers readthedocs.org
https://readthedocs.org/
MIT License
8.05k stars 3.59k forks source link

Proxito: prefix redirect appends '=' to query strings #9799

Open adswa opened 1 year ago

adswa commented 1 year ago

Details

An automatic top-level redirection (/ -> /en/latest/) has started to append an equal sign to a query string in the redirected URL. I reproduced this on several readthedocs projects of mine. The issue occurs for projects with a Prefix redirection rule and URLs with a query string (somedomain.com/somepage.html?myquery).

The project we noticed this behavior is https://readthedocs.org/projects/datalad-handbook, which has a standard prefix redirection rule. As this project has quite a few configurations, I also reproduced this for a very simple project https://readthedocs.org/projects/multimatch, which did not have any configured redirects beforehand, by adding a prefix redirect.

Expected Result

For many years, handbook.datalad.org/r.html?install successfully redirected to handbook.datalad.org/en/latest/r.html?install. Likewise, a URL like https://multimatch.readthedocs.io/api.html?some should redirect to https://multimatch.readthedocs.io/en/latest/api.html?some

Actual Result

Currently, the above URLs redirect to handbook.datalad.org/en/latest/r.html?install=, and https://multimatch.readthedocs.io/en/latest/api.html?some=, respectively.

more background

The reason that we noticed this behavior is that links broke which we "hacked" to be persistent using query strings. For the project I linked above, http://handbook.datalad.org/r.html?someID is a page that performs redirection based on RST :ref:s. We have a JavaScript script on that page that extracts the URL query string (e.g.,install) and matches it to a list of defined IDs (one of them being install). Each ID has an associated :ref: pointing to the specific file's RST anchor - for install this would be :ref:install. This way, http://handbook.datalad.org/r.html?install would always link to our projects installation page, even if we renamed the file or restructured the projects file tree. This logic broke when there is an additional equal sign appended to the URL. I have worked around this in our project by stripping appended equal signs from the query string before matching, so this issue isn't pressing, and it wouldn't be a problem if it wasn't addressed. However, I wanted to leave a note in case this change was done inadvertently.

PS: Thanks for this wonderful, wonderful project! I love readthedocs, and can't thank you enough for the work you're doing. :heart:

humitos commented 1 year ago

Hi @adswa! Do you know approximately when this started happening?

As this project has quite a few configurations, I also reproduced this for a very simple project readthedocs.org/projects/multimatch, which did not have any configured redirects beforehand, by adding a prefix redirect.

What's that "prefix refirect"? That will help us to debug this issue.

adswa commented 1 year ago

What's that "prefix refirect"? That will help us to debug this issue.

I was referring to the redirect type "Prefix redirect". This is what I see in the admin panel under "Redirects". image image

Do you know approximately when this started happening?

I don't have a reliable estimate, but I received the first issue that a link broke this week, and a colleague remembers seeing the issue two weeks ago already. It might have been happening for longer, though, and we only got reports now.

humitos commented 1 year ago

I was able to reproduce this issue with:

Prefix Redirect
/ -> /en/latest/ 

I found the issue in this code: https://github.com/readthedocs/readthedocs.org/blob/f86b5376ffc70242bdd677cb02f016429ff6bd4b/readthedocs/proxito/views/mixins.py#L308-L318

(Pdb++) query_list
[('install', '')]
(Pdb++) urlencode(query_list)
'install='
(Pdb++)
humitos commented 1 year ago

It seems this change introduced the issue https://github.com/readthedocs/readthedocs.org/pull/9420/files#diff-8e0bdac308d9115eeb0c48b93459474a47d65e16e91a85db4cf3342e5f39a39dR289 since we weren't parsing the querystring before and now we are. Because the install querystring does not have a value, Python automatically adds =