mohankreddy / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Reading diffrent Base URL to use as Seed, to crawler in a loop #77

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

1. I Have the following code, that use a SQL record set containing a series of 
Web address of intrest and loops around these, to be passed to the crawler , 
the first record works ok. but when the second record is used by the crawler i 
get the following DB Error (after code).

// Get SQL result set of URL we are intrested in.

for (String key : search.keySet()) 
        {
            System.out.println("Key: " + key + ", Value: " + search.get(key)); 
            // Generate Keys
            this.propstorage.put("Name",key);
            this.propstorage.put("URL", search.get(key));
            this.controller.addSeed(search.get(key));
            this.controller.start(ClsHttpCrawler.class, 1);
            this.propstorage.clear();
        } 

ERROR 

java.lang.IllegalStateException: Can't call Database.get: Database was closed.
    at com.sleepycat.je.Database.checkOpen(Database.java:1745)
    at com.sleepycat.je.Database.get(Database.java:876)
    at edu.uci.ics.crawler4j.frontier.DocIDServer.getDocID(DocIDServer.java:53)
    at edu.uci.ics.crawler4j.crawler.CrawlController.addSeed(CrawlController.java:183)
    at iecomps.ClsHttpController.startCrawler(ClsHttpController.java:82)
    at iecomps.Main.main(Main.java:52)
25-Aug-2011 15:32:08 iecomps.Main main

am i missing something... 

Original issue reported on code.google.com by Andy01Da...@gmail.com on 25 Aug 2011 at 2:57

GoogleCodeExporter commented 9 years ago
You need to first add all of the seeds and then start the crawler. Crawling is 
a process where you start with a set of seeds and then download their content 
and then download pages referred by them, ...

-Yasser

Original comment by ganjisaffar@gmail.com on 25 Aug 2011 at 3:03