jculvey / roboto

A web crawler/scraper/spider for nodejs
67 stars 24 forks source link

Unable to crawl a page with encoding other than utf8 #17

Open yogiatpozen opened 8 years ago

yogiatpozen commented 8 years ago

Hi,

I'm trying to crawl a page (like http://021-online.com) which has charset=gb2312. When I try to read head/title field using cheerio (that's embedded) I got, let say, crap instead of proper chars.

Am I missing some configuration, or is it a bug that prevents from properly crawling non-utf8 pages? It may also be a problem of cheerio rather than roboto itself.