tecoholic / dykapi

An API for DYK articles from Wikipedia
2 stars 0 forks source link

Titles have urlencoded characters #6

Open tecoholic opened 13 years ago

tecoholic commented 13 years ago

The title produced have url encoded characters which should be removed.

tecoholic commented 13 years ago

The urllib.quote() scrapper script produced the encoded characters. I think it is overdid. Might force to rescrap and regenerate the entire datastore :(

tecoholic commented 13 years ago

Found the way to solve the mystery urllib.unquote(unicode(line).encode('ascii')).decode('utf-8')

But the need to reupload is still there ;)

tecoholic commented 13 years ago

@srikanthlogic : if you could run the new hook scraper and upload the data with a suitable Data store table name, We could close this issue :)