Closed GoogleCodeExporter closed 8 years ago
I also see same issue
Original comment by jss.a...@gmail.com
on 9 Jun 2011 at 7:58
I made this code in Page Fetcher and works with relative urls in redirect.
if (statusCode == HttpStatus.SC_MOVED_PERMANENTLY || statusCode ==
HttpStatus.SC_MOVED_TEMPORARILY)
{
Header header = response.getFirstHeader("Location");
if (header != null) {
String movedToUrl = header.getValue();
if(!movedToUrl.contains("http://"))
{
movedToUrl = get.getURI().getScheme() + "://" + get.getURI().getHost() +
movedToUrl;
} page.getWebURL().setURL(movedToUrl);
} else { page.getWebURL().setURL(null);
}
return PageFetchStatus.Moved;
}
Original comment by DLopezGo...@gmail.com
on 15 Jun 2011 at 12:57
I think it should be the following:
if(!movedToUrl.startsWith("http://") || !movedToUrl.startsWith("https://"))
Original comment by Sunshine...@sohu.com
on 19 Aug 2011 at 4:00
Hi,
In the last suggestion, the get.getURI().getPath() is missing as a connector
this patch should solve this.
Original comment by u...@taykey.com
on 30 Aug 2011 at 3:55
Attachments:
I think it should be the following instead:
if(!movedToUrl.startsWith("http://" && !movedToUrl.startsWith("https://")
because url can not contain "http://" and "https://" simultaneously.
Original comment by lance.ch...@gmail.com
on 18 Sep 2011 at 2:55
u...@taykey.com :
I tried your patch, and can't seem to understand the error I'm getting. It
seems there is an extraneous 'else{}' in there that I removed, but it seems
like toFetchURL is appending several different URLS into one, since I get this
error message:
INFO [Crawler 1] Failed: HTTP/1.1 502 Connection reset by peer, while fetching
http://www.flickr.com/signup/https://login.yahoo.com/config/login/photos/signup/
https://login.yahoo.com/config/login/photos/signup/https://login.yahoo.com/confi
g/login/photos/signup/https://login.yahoo.com/config/login/photos/signup/https:/
/login.yahoo.com/config/login/photos/signup/https://login.yahoo.com/config/login
etc.
Any thoughts? Sorry to bother you, I guess it would seem like a missing space
somewhere, but I inserted your patch at the proper place of PageFetcher,
perhaps you could shed some light. Thanks in advance.
Original comment by Geoffrey...@gmail.com
on 17 Oct 2011 at 6:03
This issue is resolved in version 3.0
-Yasser
Original comment by ganjisaffar@gmail.com
on 2 Jan 2012 at 5:31
Original issue reported on code.google.com by
robertop...@gmail.com
on 23 May 2011 at 10:23