theanti9 / PyCrawler

A python web crawler
212 stars 104 forks source link

Exception if there is UTF8 Chars in URL #13

Closed miklagard closed 11 years ago

miklagard commented 11 years ago

I have tried to crawl http://gezinomi.com and in some url's there are Turkish letters like ı. While crawling, the script was returning an exception and stopping.

For that, in query.py:

for the line #59, i have changed args = [{'address':unicode(u)} for u in urls] into args = [{'address':u.decode("utf8")} for u in urls]

and for the line #84, i have changed s = select([self.crawl_table]).where(self.crawl_table.c.address == unicode(url)) into s = select([self.crawl_table]).where(self.crawl_table.c.address == url.decode("utf8"))

Now, it works without any problem.

theanti9 commented 11 years ago

If you want to submit a pull request for this fix I will gladly merge it :)

miklagard commented 11 years ago

Done.