prajnak / img_preprocess

0 stars 0 forks source link

How will you convert this script to work with multiple worker instances? #2

Open prats226 opened 6 years ago

prajnak commented 6 years ago

There's a few steps here:

  1. Have a master process that gets all file metadata from the bucket and stores it in json.
  2. Master process divides up the stored json into batches and sends them via RPC/HTTP to a gateway that can spawn workers
  3. The gateway can be Flask app sitting behind gevent to make use of multiple processors

The current method of using concurrent.futures.ThreadPoolExecutor is limited by CPU availability. Memory is no longer an issue as the pool takes care of running each submitted task when resources are available. The intuitive way forward would be then to use a master process that gets all file metadata and breaks them up into chunks and submits the jobs to workers via a HTTP API

prajnak commented 6 years ago

If and when needed the JSON chunks maybe submitted to a job queue like RabbitMQ or even Celery which can then be polled by workers when they're free. (This is an alternative)