scrapinghub / frontera

A scalable frontier for web crawlers
BSD 3-Clause "New" or "Revised" License
1.29k stars 216 forks source link

Crawling strategy for a topic-focused crawler #125

Open sibiryakov opened 8 years ago

sibiryakov commented 8 years ago

It would be nice to add to Frontera an optional crawling strategy for topical crawling. It could take dictionary of words describing some topic as input and crawl from seed urls searching for documents relevant to topic until some finishing condition is met.

ghost commented 5 years ago

Hey, I am interested and willing to contribute to this topic crawling strategy. Can you give me some guidance to achieve this goal?