scrapinghub / frontera

A scalable frontier for web crawlers
BSD 3-Clause "New" or "Revised" License
1.3k stars 217 forks source link

Writing custom crawling strategy tutorial #222

Closed sibiryakov closed 5 years ago

sibiryakov commented 8 years ago

short plan:

ArlasJ commented 7 years ago

Hi. Is there a tutorial already? We got a semester project for a crawler in university. We wanted to write a custom strategy.

sibiryakov commented 7 years ago

No, it's not. But you could look into existing crawling strategies: https://github.com/scrapinghub/frontera/blob/master/examples/cluster/bc/broadcrawl/__init__.py#L48 and crawling strategy base class https://github.com/scrapinghub/frontera/blob/master/frontera/worker/strategies/__init__.py#L11

ghost commented 5 years ago

@sibiryakovThis link does not work. If possible, can you provide it again?

sibiryakov commented 5 years ago

Hi, have a look at this https://frontera.readthedocs.io/en/latest/topics/custom_crawling_strategy.html