Closed GoogleCodeExporter closed 8 years ago
Original comment by avrah...@gmail.com
on 18 Aug 2014 at 3:47
I have just crawled imdb, and it gets crawled.
I don't get to any NullPointerException.
Please try again and report so we can work on this problem together
Original comment by avrah...@gmail.com
on 20 Aug 2014 at 12:45
Closed due to inactivity and no good scenario
Original comment by avrah...@gmail.com
on 23 Sep 2014 at 2:11
hi
I'm also getting only this same error while crawling a website.
java.lang.NullPointerException
at java.lang.String.<init>(String.java:556)
at edu.uci.ics.crawler4j.robotstxt.RobotstxtServer.fetchDirectives(RobotstxtServer.java:98)
at edu.uci.ics.crawler4j.robotstxt.RobotstxtServer.allows(RobotstxtServer.java:73)
at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:341)
at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:220)
at java.lang.Thread.run(Thread.java:745)
Original comment by dkkashya...@gmail.com
on 29 Sep 2014 at 12:53
This is interesting as this is the exact stacktrace.
Which version of the crawler are you using ? (v3.5 ? Latest from trunk ?)
Which site are you trying to crawl ?
Original comment by avrah...@gmail.com
on 29 Sep 2014 at 1:23
I'm using 3.5 in maven project and I'm trying to crawl songspk.name
Original comment by dkkashya...@gmail.com
on 29 Sep 2014 at 5:36
[deleted comment]
hi
Atlast my crawler stops crawling the links and I see only this
java.lang.NullPointerException
java.lang.NullPointerException
java.lang.NullPointerException
java.lang.NullPointerException
java.lang.NullPointerException
java.lang.NullPointerException
java.lang.NullPointerException
java.lang.NullPointerException
java.lang.NullPointerException
nothing else.
Original comment by dkkashya...@gmail.com
on 30 Sep 2014 at 6:42
I have checked it.
It works for me.
I have changed the code there in the last months so I probably fixed that bug.
You will need the latest code though, so please use the latest from trunk
instead of the Maven jar.
We will have a release in a month or two max I believe.
Avi.
Original comment by avrah...@gmail.com
on 2 Oct 2014 at 12:41
hi
I used version 3.5 from trunk still I'm getting same output.
java.lang.NullPointerException
at java.lang.String.<init>(String.java:556)
at edu.uci.ics.crawler4j.robotstxt.RobotstxtServer.fetchDirectives(RobotstxtServer.java:98)
at edu.uci.ics.crawler4j.robotstxt.RobotstxtServer.allows(RobotstxtServer.java:73)
at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:341)
at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:220)
at java.lang.Thread.run(Thread.java:745)
java.lang.NullPointerException
at java.lang.String.<init>(String.java:481)
at edu.uci.ics.crawler4j.robotstxt.RobotstxtServer.fetchDirectives(RobotstxtServer.java:100)
at edu.uci.ics.crawler4j.robotstxt.RobotstxtServer.allows(RobotstxtServer.java:73)
at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:341)
at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:220)
at java.lang.Thread.run(Thread.java:745)
Original comment by dkkashya...@gmail.com
on 5 Oct 2014 at 8:21
Which version exactly did you use ?
Did you take a fresh checkout from the repository (v3.6 SNAPSHOT) this week ?
Did you use something else ?
Original comment by avrah...@gmail.com
on 5 Oct 2014 at 8:28
i'm using 3.5 from here https://code.google.com/p/crawler4j/downloads/list
I didnot take any fresh checkout.Can you please give me link for that?
thanks
Original comment by dkkashya...@gmail.com
on 5 Oct 2014 at 8:40
Till we will have a new release, the way to take the latest is clone our trunk:
https://code.google.com/p/crawler4j/source/checkout
It is a bit more complicated, but I have implemented many many fixes so I think
it is well worth it
Original comment by avrah...@gmail.com
on 5 Oct 2014 at 8:55
Till we will have a new release, the way to take the latest is clone our
trunk:
https://code.google.com/p/crawler4j/source/checkout
It is a bit more complicated, but I have implemented many many fixes so I
think it is well worth it
Original comment by avrah...@gmail.com
on 5 Oct 2014 at 8:55
Hi,
I used this code from trunk but now crawler is really slow.
Original comment by dkkashya...@gmail.com
on 6 Oct 2014 at 2:43
Try commenting out the following lines from parser/Parser.java:
LanguageIdentifier languageIdentifier = new
LanguageIdentifier(parseData.getText());
page.setLanguage(languageIdentifier.getLanguage());
Original comment by avrah...@gmail.com
on 6 Oct 2014 at 2:57
I did as you told but still it is slow and i see only this:
Oct 07, 2014 9:20:45 AM org.apache.http.client.protocol.ResponseProcessCookies
processCookies
WARNING: Cookie rejected [SETTINGS.LOCALE="en%5Fus", version:0,
domain:.adobe.com, path:/cfusion/, expiry:Thu Sep 29 09:20:45 CEST 2044]
Illegal path attribute "/cfusion/". Path of origin: "/robots.txt"
Oct 07, 2014 9:22:59 AM org.apache.http.impl.execchain.RetryExec execute
INFO: I/O exception (org.apache.http.NoHttpResponseException) caught when
processing request to {}->http://247wallst.com:80: The target server failed to
respond
Oct 07, 2014 9:22:59 AM org.apache.http.impl.execchain.RetryExec execute
INFO: Retrying request to {}->http://247wallst.com:80
Original comment by dkkashya...@gmail.com
on 7 Oct 2014 at 11:33
i resolved that but crawler is still slow.what can i do now??
Original comment by dkkashya...@gmail.com
on 8 Oct 2014 at 7:55
hmmm, I need to profile the crawler to see what I changed which made it slower
and fix it.
It will take a couple of days though...
Original comment by avrah...@gmail.com
on 10 Oct 2014 at 10:21
I have released v4.0
Profiled v3.5 vs v4.0 and v4.0 is faster!
Original comment by avrah...@gmail.com
on 22 Jan 2015 at 11:45
Original comment by avrah...@gmail.com
on 22 Jan 2015 at 2:59
Original issue reported on code.google.com by
av...@shevo.co.il
on 22 Dec 2013 at 9:35