webplatform / mediawiki-conversion

Convert MediaWiki XML backup into structured raw text file tree
https://github.com/webplatform/docs
15 stars 4 forks source link

Ensure original URLs are redirected (301 Moved Permanently) to normalized names, including pages that has redirects #6

Closed renoirb closed 8 years ago

renoirb commented 9 years ago

Once we have #3 addressed, we’ll have to configure web server to make proper HTTP redirection for funny URL names and others that may create conflicts (e.g. inconsistent casing #2)

Handle cases such as:

Objectives

  1. Keep original URLs in place
  2. Redirect to a valid URL that removes potential confusion (ref: URL "code-points", and RFC3986 at 3.3. Path)
  3. Make obvious mapping from URL into a file name in the source repository

    Expected deliverables

renoirb commented 9 years ago

URLs to test out.

Those are ones with quirky URLS MediaWiki was allowing that we must redirect.

To consider

Questions

Should handle URL with invalid character components to a valid location

Should handle redirects of deleted content with redirects left behind

Based on a few entries from reports/summary.yml that has a "redirect_to" flag.

In the following links; Does it really has links "with a space" in the page content (e.g. "[[canvas/tutorial/Canvas tutorial/Drawing shapes]]"), or only with an underscore (e.g. "[[canvas/tutorial/Canvas_tutorial/Drawing_shapes]]")

Should give message to disabled features

Tests:

Redirect WPD pages into migrated content

Should handle translation and open the appropriate content

Should redirect anything /wiki/foo to /foo only at the end

Should sanitize URLs

Should be case InSeNsiTive

... and redirect at the same place location

http://67.205.56.184/wiki/apis/webrtc/objects/MediaStream/properties/videoTracks

http://67.205.56.184/wiki/Main_Page :

renoirb commented 9 years ago

The redirect map is generated through the app/console mediawiki:summary and creates reports/nginx_redirects.map that we’ll have loaded in NGINX configuration.

renoirb commented 9 years ago

Some more URLs to try

Main content

WPD namespace

renoirb commented 9 years ago

Let’s make HTML some MediaWiki ASK query results.

Migrated tasks originally noted here into webplatform/docs#2

Converted:

To review

renoirb commented 8 years ago

Most use cases should work now!