scrapinghub / frontera

A scalable frontier for web crawlers
BSD 3-Clause "New" or "Revised" License
1.29k stars 215 forks source link

"Can't identify DB worker type" running DB worker #397

Closed gesof closed 4 years ago

gesof commented 4 years ago

I think it is an issue with the documentation. According to https://frontera.readthedocs.io/en/latest/topics/cluster-setup.html#starting-the-cluster

# Optionally, start next one dedicated to spider log processing. $ python -m frontera.worker.db --no-batches --config [db worker config module]

That throws a: [db-worker] Can't identify DB worker type (no-scoring False, no-batches True, no-incoming False)

Looking into the source code: if no_batches and no_scoring: db_worker_type = 'linksdb' elif no_batches and no_incoming: db_worker_type = 'scoring' elif no_incoming and no_scoring: db_worker_type = 'batchgen' else: logger.warning("Can't identify DB worker type " "(no-scoring {}, no-batches {}, no-incoming {})" .format(no_scoring, no_batches, no_incoming)) db_worker_type = 'none'

So I believe that beside --no-batches one should also provide --no-scoring or --no-incomming as per the code above. Still, I am not sure which or why.

Prometheus3375 commented 4 years ago
  1. --no-scoring and --no-incoming means what db worker just generates batches for spiders (spider feed).
  2. --no-scoring and --no-batches means what db worker reads spider log and saves metadata.
  3. --no-batches and --no-incoming means what db worker reads scoring log from strategy worker and updates queue.

In the tutorial 2nd is not necessary and only one db worker is started, that's why only --no-incoming flag specified. DB worker with such flag generates spider feed and processes scoring log.

gesof commented 4 years ago

Good. And why not updating the doc with this description? Instead, a warning gets thrown letting the users in confusion.

sibiryakov commented 4 years ago

Sure, guys. PRs are always welcomed.

A.

22 июня 2020 г., в 17:35, Gesof notifications@github.com написал(а):

Good. And why not updating the doc with this description? Instead, a warning gets thrown letting the users in confusion.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

gesof commented 4 years ago

Added the options in a PR