Special case for downloading Wikipedia content

Wikipedia offers website dumps. But they aren't too useful to us as we want one page at a time. It allows for boilerplate-free articles, fo example: https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering?action=render. It also allows one to download XML (https://en.wikipedia.org/wiki/Special:Export/Naive_Bayes_spam_filtering), which we can handle with:

(cxml:parse-octets (drakma:http-request "https://en.wikipedia.org/wiki/Special:Export/Naive_Bayes_spam_filtering") (cxml-xmls:make-xmls-builder))

This will require additional parsing to get relevant text out in the general case. Once we solve this, we must be careful not to let Wikipedia overwhelm certain classes - it still has a style. But it might allow for some interesting Wikipedia-only test cases.

xdvom03 / klaus

Special case for downloading Wikipedia content #28