mkdocs / mkdocs-redirects

Open source plugin for Mkdocs page redirects
MIT License
176 stars 25 forks source link

some utf-8 characters break redirects #40

Closed vmiko closed 2 years ago

vmiko commented 2 years ago

Hi everyone

I'm facing an issue with your amazing plugin. Sadly it's only reproductible on a windows environment.

When we put a redirect containing special UTF-8 characters like below, the request on the corresponding page returns a 404 code.

    - redirects:
        redirect_maps:
            'index.md': 'présentation/index.md'

If we check the html generating the redirect, we could observe the following output :

$ curl http://127.0.0.1:8000

<!doctype html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <title>Redirecting...</title>
    <link rel="canonical" href="pr�sentation/">
    <meta name="robots" content="noindex">
    <script>var anchor=window.location.hash.substr(1);location.href="pr�sentation/"+(anchor?"#"+anchor:"")</script>
    <meta http-equiv="refresh" content="0; url=pr�sentation/">
</head>
<body>
Redirecting...
</body>
</html>

To compare with a linux environment (WSL2)

$ curl http://127.0.0.1:8000

<!doctype html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <title>Redirecting...</title>
    <link rel="canonical" href="présentation/">
    <meta name="robots" content="noindex">
    <script>var anchor=window.location.hash.substr(1);location.href="présentation/"+(anchor?"#"+anchor:"")</script>
    <meta http-equiv="refresh" content="0; url=présentation/">
</head>
<body>
Redirecting...
</body>
</html>

As we can see the href is not proper managed and the redirection is therefore broken.

Do you have any idea on how to fix this issue ? I'm not really friendly with python, so it will be difficult for me to investigate this issue.

Please note that I use these versions :

Thanks in advance

Lunik commented 2 years ago

The issues seems to be related with the default encoding in Windows.

In a Windows environment : When generating a site with this plugin, here is the format of the generated file :

$ file site/index.html 
site/index.html: HTML document, ISO-8859 text, with CRLF line terminators

However every other files :

$ file site/path/to/other/index.html 
site/path/to/other/index.html : HTML document, UTF-8 Unicode text, with very long lines

In the case of mkdocs itself, it explicitly force the output format of his files to UTF-8 see : https://github.com/mkdocs/mkdocs/blob/master/mkdocs/commands/build.py

Updating this line to make the file encoding parametrable. Something like :

-with open(old_path_abs, 'w') as f:
+with open(old_path_abs, 'w', encoding='utf-8') as f:

If this issue is resolved by this simple fix a PR shouldn't be too hard

@vmiko could you please test the proposed fix ? And if working submit a PR ?

oprypin commented 2 years ago

@Lunik Yes that's certainly the correct fix. Why don't you propose a PR, then? :)

oprypin commented 2 years ago

Now this is released