Closed GoogleCodeExporter closed 8 years ago
I see now it does end, if you add a Print statement after
controller.start(MyCrawler.class, numberOfCrawlers); it wil fire after the 10
seconds. Maybe adding a print line in saying "Final clean up complete" might
help make this more clear?
Original comment by chrstah...@gmail.com
on 6 Mar 2012 at 3:51
Hi, good afternoon! My name is Edwaldo am Brazilian and I am now starting a
project where I will use the Crawler4j. I studied all your documentation and
implemented the code, however, I am unable to make the collection. I always
returns the following information:
Deleting content of:
D:\eclipse\EclipsePortableJava\Data\workspace\WebCrawler\intermediario\frontier
INFO [main] Crawler 1 started.
INFO [main] Crawler 2 started.
INFO [main] Crawler 3 started.
INFO [main] Crawler 4 started.
INFO [main] Crawler 5 started.
INFO [main] Crawler 6 started.
INFO [main] Crawler 7 started.
INFO [main] Crawler 8 started.
INFO [main] Crawler 9 started.
INFO [main] Crawler 10 started.
Docid: 1
URL: http://www.submarino.com.br/
Domain: 'submarino.com.br'
Sub-domain: 'www'
Path: '/'
Parent page: null
Anchor text: null
Text length: 43621
Html length: 235817
Number of outgoing links: 613
Response headers:
X-Powered-By: Servlet/2.5 JSP/2.1
X-Powered-By: JSF/1.2
Content-Encoding: gzip
Content-Type: text/html; charset=UTF-8
Expires: Fri, 09 May 2014 19:49:38 GMT
Cache-Control: max-age=0, no-cache, no-store
Pragma: no-cache
Date: Fri, 09 May 2014 19:49:38 GMT
Content-Length: 33259
Connection: keep-alive
Vary: Accept-Encoding
Set-Cookie: acomChannel=INTERNET; path=/; domain=submarino.com.br
Set-Cookie: b2wChannel=INTERNET; path=/; domain=submarino.com.br
Set-Cookie: akaau=1399665278~id=3010416469baa56f7d459fb7d3d19525; path=/
=============
INFO [Thread-1] It looks like no thread is working, waiting for 10 seconds to make sure...
INFO [Thread-1] No thread is working and no more URLs are in queue waiting for another 10 seconds to make sure...
INFO [Thread-1] All of the crawlers are stopped. Finishing the process...
INFO [Thread-1] Waiting for 10 seconds before final clean up...
As you can see it runs Crawler4j it, so that you can verify the amount of links
found with the last seed, however, later gives an error as if the treads were
not working.
Could anyone help me?
Urgently needed.
Thank you!
Original comment by edwaldos...@gmail.com
on 10 May 2014 at 7:40
Not a bug or feature request
Original comment by avrah...@gmail.com
on 11 Aug 2014 at 1:10
Original issue reported on code.google.com by
chrstah...@gmail.com
on 14 Feb 2012 at 4:05