webplatform / mediawiki-conversion

Convert MediaWiki XML backup into structured raw text file tree
https://github.com/webplatform/docs
15 stars 4 forks source link

Get a list of links (internal and external) on every page so we can extract code samples #18

Open renoirb opened 9 years ago

renoirb commented 9 years ago

time for some link cleanup in the content.

Some contributors created code samples elsewhere than code.webatform.org and also had an issue with the user in which code.webplatform.org (Dabblet) was saving gists. We should gather them all.

Solution path

While we still have contents in MediaWiki, use the parse action and extract them

E.g.:

https://docs.webplatform.org/w/api.php?action=parse&prop=links%7Cimages%7Cexternallinks%7Ciwlinks%7Cproperties&disabletoc=true&disablepp=true&page=css/properties/border-radius
renoirb commented 9 years ago

After digging, we can also get information on links that are part of the same wiki and learn whether or not MediaWiki has a page at the location the link points to.

Members of the links array has an exists member only when there is a page, otherwise its not there.

{
    "parse": {
        "title": "css/properties/border-radius",
        "links": [
            {
                "ns": 0,
                "*": "css/concepts/computed value",
                "exists": ""
            },
            {
                "ns": 0,
                "*": "css/bar"
            }
        ]
    }
}

In the example above the page [[css/properties/border-radius]] contains two links.

  1. pointing to an existing page known as [[css/concepts/computed value]]
  2. goes to a non existing page [[css/bar]].
renoirb commented 9 years ago

Some code samples are also not on code.webplatform.org.

Some other links were found (list them here):