tecoholic / dykapi

An API for DYK articles from Wikipedia
2 stars 0 forks source link

Handle (pictured) hooks #5

Open srikanthlogic opened 13 years ago

srikanthlogic commented 13 years ago

Store picture url for hooks which contain (pictured)

tecoholic commented 13 years ago

Ran a test scarper. 2011 archives have good consistency. 2007 consistency is too bad. could match only 1 image, the next thing fails :( Might need to rite multiple scrapers or ?

srikanthlogic commented 13 years ago

May be different methods for different periods in the same scrapper ?

tecoholic commented 13 years ago

Found a semicolon missing in the <div stye="blah blah;"> to be the issue. Modified scrapper to use both the strings with/without ; as selector. Confirmed backward compatible upto 2005 June. (99% sure, there might be some rare occurrences of the image being not associated with the first hook <li>).