WebURLs and WebCrawler now supports for individual URLs to be followed right away even if they were already visited. They will not be scheduled, but processed.
I needed to implement this because I found a site that used a common URL for redirections and based content on it's internal session or something I could't figure out.
Even if I managed to schedule visited URLs again, after scheduling all'em showed the same content: the one referenced in the last "previous page" visited. After allowing the crawler to visit sites inmediatly, the problem was solved.
Since this can generate non-desired infinite redirection loop, there's a maximum automatic redirection depth that can be configured on WebURLs: maxInmediateRedirects.
By default, this vehaviour is disabled. The creator of the WebURL is responsible of enabling it on a per-URL basis
WebURLs and WebCrawler now supports for individual URLs to be followed right away even if they were already visited. They will not be scheduled, but processed.
I needed to implement this because I found a site that used a common URL for redirections and based content on it's internal session or something I could't figure out.
Even if I managed to schedule visited URLs again, after scheduling all'em showed the same content: the one referenced in the last "previous page" visited. After allowing the crawler to visit sites inmediatly, the problem was solved.
Since this can generate non-desired infinite redirection loop, there's a maximum automatic redirection depth that can be configured on WebURLs: maxInmediateRedirects.
By default, this vehaviour is disabled. The creator of the WebURL is responsible of enabling it on a per-URL basis