Closed GoogleCodeExporter closed 9 years ago
And also the siteURL. I will be supplying a lot of seed URLs, but in visit()
method, I need to know the siteURL and parentURL as well.
Original comment by w3engine...@gmail.com
on 23 Jan 2012 at 9:30
Good features.
Original comment by mansur.u...@gmail.com
on 24 Jan 2012 at 9:53
Would be a very nice feature. +1
Original comment by milkdata...@gmail.com
on 24 Jan 2012 at 10:32
Hi yasser... Any solutions for this ?
Original comment by w3engine...@gmail.com
on 31 Jan 2012 at 11:31
Hi,
I just changed the WebURL class:
class WebURL {
...
private boolean isBaseUrlSet;
private String baseURL; //it is site url
...
public void setURL(String url) {
this.url = url; //redirected url;
//set only once
if(!isBaseUrlSet) {
baseURL = url;
isBaseUrlSet = true;
}
}
...
}
Parent url can only be set in WebCrawler.processPage(..) method. This means
WebURL has to be changed to have this functionality for the time being.
Regs
Original comment by mansur.u...@gmail.com
on 31 Jan 2012 at 1:03
I am not sure if this solution works, because parent URL should be also
persisted if it is needed in the visit method. Anyway, I will try to add this
feature over the weekend.
-Yasser
Original comment by ganjisaffar@gmail.com
on 31 Jan 2012 at 8:57
I have dont it as well and it works fine. Created a get and set method for
parentURL in webURL:
public String getParentURL() {
return parentURL;
}
public void setParentURL(String parentURL) {
this.parentURL = parentURL;
}
and added this in WebCrawler.java(at 2 places line No: 249 & 284):
webURL.setParentURL(curURL.getURL());
But the main issue was how to get the parentURL for the broken links in the
below method:
handlePageStatusCode(WebURL webUrl, int statusCode, String statusDescription)
Any suggestions for that?
Original comment by w3engine...@gmail.com
on 1 Feb 2012 at 2:44
This is currently implemented in the source code:
http://code.google.com/p/crawler4j/source/detail?r=2f6a89cfd07bf6e87f92f361359d0
fbca81b634d
Will be included in the next release.
-Yasser
Original comment by ganjisaffar@gmail.com
on 4 Feb 2012 at 10:10
Original issue reported on code.google.com by
w3engine...@gmail.com
on 23 Jan 2012 at 3:07