scrapinghub / frontera

A scalable frontier for web crawlers
BSD 3-Clause "New" or "Revised" License
1.29k stars 216 forks source link

Update cluster-setup docs #351

Closed guillermoap closed 5 years ago

guillermoap commented 5 years ago

Hello, I've been using frontera for the last couple of months and have found that in some places the docs are not up to date. In this case the setup-cluster docs.

If I try to run the dbworker as specified in the setup cluster doc on line 130, by running: python -m frontera.worker.db --config src.config.db_worker --no-scoring --no-incoming --partitions 0,1

I get the following output:

dbworker_batch_1        |              [--partitions [PARTITIONS [PARTITIONS ...]]] --config CONFIG
dbworker_batch_1        |              [--log-level LOG_LEVEL] [--port PORT]
dbworker_batch_1        | db.py: error: argument --partitions: invalid int value: '0,1'

By trial and error I found out that the current correct way to initialize the dbworker with a specific number of partitions is by running the following: python -m frontera.worker.db --config src.config.db_worker --no-scoring --no-incoming --partitions 0 1

As well the CRAWLING_STRATEGY config var that is specified in the doc, on line 91, if you config that var the specified crawling strategy is not taken into account by frontera. So I looked into the default_settings file, on line 77, to see how to correctly set that var and there the var that does that is named STRATEGY. When I made that change the strategy started working as expected.

So to sum everything up, I've just updated the docs to reflect this changes.

jpbalarini commented 5 years ago

👍

sibiryakov commented 5 years ago

thank you very much!