molybdenum-99 / reality

Comprehensive data proxy to knowledge about real world
MIT License
817 stars 43 forks source link

Using Reality in other languages #41

Closed zverok closed 8 years ago

zverok commented 8 years ago

Q: Can I use the reality gem with the German Wikipedia too? (@wintermeyer at Twitter)

A: It one of far targets, but it is not easy target. Reality currently extracts core data from (English) Wikipedia and Wikidata.

Wikidata is multilingual and uniformly so. Reality currently ignores this fact: there are hard-coded "en" here and there, but I feel like it should be easy to generalize the behavior.

With Wikipedia, things are harder. infoboxer gem, which we develop for structured access to Wikipedia, is targeted to be multi-lingual:

page = Infoboxer.wikipedia(:de).get('Deutschland')
# => #<Page(title: "Deutschland", url: "https://de.wikipedia.org/wiki/Deutschland")
puts page.paragraphs.first
# (Vollform: Bundesrepublik Deutschland) ist ein föderal verfasster Staat in Mitteleuropa....

But for access to most interesting data there is need for use Wikipedia templates, which are language-version dependent (defined by names). For example,

page.infobox
# => nil

...while visually we can see there's infobox on the page.

That's because there are separate template definitions for each wiki, and only for English wikipedia they are defined: https://github.com/molybdenum-99/infoboxer/blob/master/lib/infoboxer/definitions/en.wikipedia.org.rb

So, if you want to have Reality in your language, you'll need to start from grasping Wikipedia templates and defining most popular of them. Like this :-\

zverok commented 8 years ago

Closing the issue, inforamtion had moved to wiki instead.