liseryang / openbotlist

Automatically exported from code.google.com/p/openbotlist
0 stars 0 forks source link

Remote crawling for botlist #12

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
For the remote crawl system, will operate in different stages:

1. Extract URLs
2. Perform nutch crawl of URLs
3. Hook up solr to the index
4. Search terms from solr through haskell interface
5. Store the terms in a database
6. Send to botlist host system

Original issue reported on code.google.com by berlin.b...@gmail.com on 23 Nov 2007 at 7:06

GoogleCodeExporter commented 9 years ago

Original comment by berlin.b...@gmail.com on 2 Apr 2008 at 9:11