rohit-dua / BUB

BUB : Book Uploader Bot
http://tools.wmflabs.org/bub/
20 stars 9 forks source link

Define multiple workers in config and spawn one job per item #35

Closed nemobis closed 8 years ago

nemobis commented 9 years ago

Currently we have

tools.bub@tools-bastion-01:~/public_html/BUB/bot$ ls *py
mass_worker_1.py  mass_worker_2.py  mass_worker_3.py  mass_worker.py  upload_checker.py  worker.py

This duplicates code, which is just ugly. What's risky is that each worker runs indefinitely. It would be better to have a single "permanent" worker, which reads configuration for concurrency etc. from some configuration file and then spawns a labs grid job for each item, or at least for each download, so that they always use different IPs and are less likely to be blocked.

rohit-dua commented 8 years ago

I don't think grid jobs can spawn new grid jobs. So the mass_worker now starts a new worker with worker number as argument, such as

tools.bub@tools-bastion-01:~/public_html/BUB/bot$ ./mass_worker.py 1
tools.bub@tools-bastion-01:~/public_html/BUB/bot$ ./mass_worker.py 2

This avoids duplicate code.

tools.bub@tools-bastion-01:~/public_html/BUB/bot$ ls *.py
mass_worker.py  upload_checker.py  worker.py