stewartmckee / cobweb

Web crawler with very flexible crawling options. Can either use standalone or can be used with resque to perform clustered crawls.
MIT License
227 stars 45 forks source link

Inbound links are not normalized when stored #29

Closed gh2k closed 9 years ago

gh2k commented 10 years ago

If I call Stats.inbound_links_for(my_url) during parse, sometimes I don't see the correct results. This is due to the fact that the URI being processed during parse has been normalized before fetching the page data, but links are not normalized before having their digests calculated as redis keys.