molybdenum-99 / infoboxer

Wikipedia information extraction library
MIT License
174 stars 16 forks source link

Templates: extract array of hashes #45

Open zverok opened 8 years ago

zverok commented 8 years ago

For example, in country infoboxes is common pattern:

...
|leader_title1      = [[President of Israel|President]]
|leader_name1       = [[Reuven Rivlin]]
|leader_title2      = [[Prime Minister of Israel|Prime Minister]]
|leader_name2       = [[Benjamin Netanyahu]]
...
|established_event1       = [[Israeli Declaration of Independence|Declared]]
|established_date1        = 14 May 1948
|established_event2       = [[Israel, Palestine, and the United Nations|Recognition]]
|established_date2        = 1 May 1949
...

Best solution should somewhat like that:

page.infobox.array_of_hashes('established') 
# => [
#  {
#    event: Wikilink(Declared, link: Israeli Declaration of Independence), 
#    date: '14 May 1948',
#  },
#  {
#    event: Wikilink(Recognition, link: Israel, Palestine, and the United Nations),
#    date: '1 May 1949'
#  }
#]

page.infobox.array_of_hashes('leader')
# => [
#  {
#    title: Wikilink(President, President of Israel),
#    name: Wikilink(Reuven Rivlin)
#  },
#  {
#    title: Wikilink(Prime Minister, Prime Minister of Israel),
#    name: Wikilink(Benjamin Netanyahu)
#  }
#]
zverok commented 8 years ago

Similar (or the same) task:

# source:
# |area_rank = 120th
# |area_magnitude = 1 E10
# |area_km2 = 69,420
# |area_sq_mi = 26,911

# code:
page.infobox.fetch_all('area')
# => {
#   rank: "120th",
#   magnitude: "1 E10",
#   km2: "69,420",
#   sq_mi: "26,911"
# }