molybdenum-99 / infoboxer

Wikipedia information extraction library
MIT License
174 stars 16 forks source link

Some hash data not showing #65

Closed ninjabunny-dev closed 8 years ago

ninjabunny-dev commented 8 years ago

My program is not reading the entire hash https://github.com/xxninjabunnyxx/bulbaripper-tcg

This is the command ruby bulbaripper-tcg.rb -v -s 'Base Set' -u 'http://bulbapedia.bulbagarden.net/wiki/Base_Set_(TCG)'

the cost is blank and the effect has parts missing. I don't know if this is a problem with my code or with infoboxer its self.

zverok commented 8 years ago

Sorry, the thing is just too complicated for me to grasp it entirely. Could you please provide short self-sufficient test case to reproduce the bug?..

ninjabunny-dev commented 8 years ago

this is one of the outputs "type"=>"Grass", "cost"=>"", "name"=>"Thrash", "jname"=>"あばれる", "jtrans"=>"Violent Struggle", "damage"=>"30+", "effect"=>"Flip a coin. If heads, this attack does 30 damage plus 10 more damage; if tails, this attack does 30 damage and Nidoking does 10 damage to itself."

zverok commented 8 years ago

How can I quickly reporduce it?.. Running ruby bulbaripper-tcg.rb -v -s 'Base Set' -u 'http://bulbapedia.bulbagarden.net/wiki/Base_Set_(TCG)' provided me with this output:

Ripping set list table from Bulbapedia

1,102,Alakazam,Alakazam_(Base_Set_1)
2,102,Blastoise,Blastoise_(Base_Set_2)
3,102,Chansey,Chansey_(Base_Set_3)
4,102,Charizard,Charizard_(Base_Set_4)
5,102,Clefairy,Clefairy_(Base_Set_5)
6,102,Gyarados,Gyarados_(Base_Set_6)
7,102,Hitmonchan,Hitmonchan_(Base_Set_7)
8,102,Machamp,Machamp_(Base_Set_8)
9,102,Magneton,Magneton_(Base_Set_9)
10,102,Mewtwo,Mewtwo_(Base_Set_10)
11,102,Nidoking,Nidoking_(Base_Set_11)
12,102,Ninetales,Ninetales_(Base_Set_12)
13,102,Poliwrath,Poliwrath_(Base_Set_13)
14,102,Raichu,Raichu_(Base_Set_14)
15,102,Venusaur,Venusaur_(Base_Set_15)
16,102,Zapdos,Zapdos_(Base_Set_16)
17,102,Beedrill,Beedrill_(Base_Set_17)
18,102,Dragonair,Dragonair_(Base_Set_18)
19,102,Dugtrio,Dugtrio_(Base_Set_19)
20,102,Electabuzz,Electabuzz_(Base_Set_20)
21,102,Electrode,Electrode_(Base_Set_21)
22,102,Pidgeotto,Pidgeotto_(Base_Set_22)
23,102,Arcanine,Arcanine_(Base_Set_23)
24,102,Charmeleon,Charmeleon_(Base_Set_24)
25,102,Dewgong,Dewgong_(Base_Set_25)
26,102,Dratini,Dratini_(Base_Set_26)
27,102,Farfetch'd,Farfetch%27d_(Base_Set_27)
28,102,Growlithe,Growlithe_(Base_Set_28)
29,102,Haunter,Haunter_(Base_Set_29)
30,102,Ivysaur,Ivysaur_(Base_Set_30)
31,102,Jynx,Jynx_(Base_Set_31)
32,102,Kadabra,Kadabra_(Base_Set_32)
33,102,Kakuna,Kakuna_(Base_Set_33)
34,102,Machoke,Machoke_(Base_Set_34)
35,102,Magikarp,Magikarp_(Base_Set_35)
36,102,Magmar,Magmar_(Base_Set_36)
37,102,Nidorino,Nidorino_(Base_Set_37)
38,102,Poliwhirl,Poliwhirl_(Base_Set_38)
39,102,Porygon,Porygon_(Base_Set_39)
40,102,Raticate,Raticate_(Base_Set_40)
41,102,Seel,Seel_(Base_Set_41)
42,102,Wartortle,Wartortle_(Base_Set_42)
43,102,Abra,Abra_(Base_Set_43)
44,102,Bulbasaur,Bulbasaur_(Base_Set_44)
45,102,Caterpie,Caterpie_(Base_Set_45)
46,102,Charmander,Charmander_(Base_Set_46)
47,102,Diglett,Diglett_(Base_Set_47)
48,102,Doduo,Doduo_(Base_Set_48)
49,102,Drowzee,Drowzee_(Base_Set_49)
50,102,Gastly,Gastly_(Base_Set_50)
51,102,Koffing,Koffing_(Base_Set_51)
52,102,Machop,Machop_(Base_Set_52)
53,102,Magnemite,Magnemite_(Base_Set_53)
54,102,Metapod,Metapod_(Base_Set_54)
55,102,Nidoran♂,Nidoran%E2%99%82_(Base_Set_55)
56,102,Onix,Onix_(Base_Set_56)
57,102,Pidgey,Pidgey_(Base_Set_57)
58,102,Pikachu,Pikachu_(Base_Set_58)
59,102,Poliwag,Poliwag_(Base_Set_59)
60,102,Ponyta,Ponyta_(Base_Set_60)
61,102,Rattata,Rattata_(Base_Set_61)
62,102,Sandshrew,Sandshrew_(Base_Set_62)
63,102,Squirtle,Squirtle_(Base_Set_63)
64,102,Starmie,Starmie_(Base_Set_64)
65,102,Staryu,Staryu_(Base_Set_65)
66,102,Tangela,Tangela_(Base_Set_66)
67,102,Voltorb,Voltorb_(Base_Set_67)
68,102,Vulpix,Vulpix_(Base_Set_68)
69,102,Weedle,Weedle_(Base_Set_69)
70,102,Clefairy Doll,Clefairy_Doll_(Base_Set_70)
71,102,Computer Search,Computer_Search_(Base_Set_71)
72,102,Devolution Spray,Devolution_Spray_(Base_Set_72)
73,102,Impostor Professor Oak,Impostor_Professor_Oak_(Base_Set_73)
74,102,Item Finder,Item_Finder_(Base_Set_74)
75,102,Lass,Lass_(Base_Set_75)
76,102,Pokémon Breeder,Pok%C3%A9mon_Breeder_(Base_Set_76)
77,102,Pokémon Trader,Pok%C3%A9mon_Trader_(Base_Set_77)
78,102,Scoop Up,Scoop_Up_(Base_Set_78)
79,102,Super Energy Removal,Super_Energy_Removal_(Base_Set_79)
80,102,Defender,Defender_(Base_Set_80)
81,102,Energy Retrieval,Energy_Retrieval_(Base_Set_81)
82,102,Full Heal,Full_Heal_(Base_Set_82)
83,102,Maintenance,Maintenance_(Base_Set_83)
84,102,PlusPower,PlusPower_(Base_Set_84)
85,102,Pokémon Center,Pok%C3%A9mon_Center_(Base_Set_85)
86,102,Pokémon Flute,Pok%C3%A9mon_Flute_(Base_Set_86)
87,102,Pokédex,Pok%C3%A9dex_(Base_Set_87)
88,102,Professor Oak,Professor_Oak_(Base_Set_88)
89,102,Revive,Revive_(Base_Set_89)
90,102,Super Potion,Super_Potion_(Base_Set_90)
91,102,Bill,Bill_(Base_Set_91)
92,102,Energy Removal,Energy_Removal_(Base_Set_92)
93,102,Gust of Wind,Gust_of_Wind_(Base_Set_93)
94,102,Potion,Potion_(Base_Set_94)
95,102,Switch,Switch_(Base_Set_95)
96,102,Double Colorless Energy,Double_Colorless_Energy_(Base_Set_96)
97,102,Fighting Energy,Fighting_Energy_(Base_Set_97)
98,102,Fire Energy,Fire_Energy_(Base_Set_98)
99,102,Grass Energy,Grass_Energy_(Base_Set_99)
100,102,Lightning Energy,Lightning_Energy_(Base_Set_100)
101,102,Psychic Energy,Psychic_Energy_(Base_Set_101)
102,102,Water Energy,Water_Energy_(Base_Set_102)

Checking for new templates

No new templates found

Checking for new keys

41 Seel new pokemon keys
1

Found new keys

Please update the key and/or template list and upgrade the data extractor
ninjabunny-dev commented 8 years ago

try pulling it again, I had to update the key list.

zverok commented 8 years ago

Steel too complicated for me, sorry :( Please try to isolate the test case like "when I extract this concrete page with this template definitions, I expect A, but receive B"

ninjabunny-dev commented 8 years ago

I expect Card info for Alakazam extracted {"type"=>"Psychic", "cost"=>"[Psychic] Psychic] [Psychic]", "name"=>"Confuse Ray", "jname"=>"あやしいひかり", "jtrans"=>"Eerie Light", "damage"=>"30", "effect"=>"Flip a coin. If heads, the Defending Pokémon is now Confused."}

but receive Card info for Alakazam extracted {"type"=>"Psychic", "cost"=>"", "name"=>"Confuse Ray", "jname"=>"あやしいひかり", "jtrans"=>"Eerie Light", "damage"=>"30", "effect"=>"Flip a coin. If heads, the Defending Pokémon is now ."}

In extraSettings.rb I set {{e|Psychic}} to be [Psychic], and {{TCG|Confused}} to be Confused. I commented out extraSettings.rb but I still get the same result.

zverok commented 8 years ago

There are two main problems in your extraRules.rb: First:

Infoboxer::MediaWiki::Traits.for($wiki) # $wiki is nil. should be 'bulbapedia.bulbagarden.net'

Second: traits should be defined before you create your wiki instance. In your code, you create instance of Wiki in lib/settings.rb, and only then define templates. When I fixed this, I received:

{"type"=>"Psychic", "cost"=>"[Psychic][Psychic][Psychic]", "name"=>"Confuse Ray", "jname"=>"あやしいひかり", "jtrans"=>"Eerie Light", "damage"=>"30", "effect"=>"Flip a coin. If heads, the Defending Pokémon is now Confused."}

(As a side note, your code is unnecessary complicated and non-optimal. For example, in dataExtractor.rb, instead of Wiki.get(URI.unescape(CardTable[cardNumber][2])) each time you need to do something with page, you can just do page = Wiki.get(...) at start of method, and use page variable further. When it done your way, the same page is fetched multiple times without any need.)

ninjabunny-dev commented 8 years ago

How did you fix it?

zverok commented 8 years ago

1) change this line to

  Infoboxer::MediaWiki::Traits.for('bulbapedia.bulbagarden.net')

2) require 'extraRules' AND call extraRules() method before requiring 'settings' (which creates wiki instance)

ninjabunny-dev commented 8 years ago

Thanks for the help, I also refactored my code and now it runs way faster.