idio / json-wikipedia

Json Wikipedia, contains code to convert the Wikipedia xml dump into a json dump. Questions? https://gitter.im/idio-opensource/Lobby
17 stars 2 forks source link

Annotations containing `:` #7

Closed dav009 closed 9 years ago

dav009 commented 9 years ago

https://en.wikipedia.org/wiki/Hayami contains some links in lists, those dbpedia Ids contains: : . i.e :

Those links are being skipped. Probably a problem with jwpl.

dav009 commented 9 years ago

: is used to define namespaces within JWPL . i.e: image: url_to.jpg. Uris with : will get an UNKNOWN link.type (ideally they should be INTERNAL)

dav009 commented 9 years ago

known reserved namespaces:

worth taking a look at: https://www.mediawiki.org/wiki/Help:Namespaces

dav009 commented 9 years ago

https://www.mediawiki.org/wiki/Extension_default_namespaces

dav009 commented 9 years ago

This is being reopened cause there are surface forms starting with ":". Many of those which seems to belong to a particular language-related namespace i.e: ":en:XX". There are also many sfs like : ":User:XX"

dav009 commented 9 years ago

Linking to an article in a different language: