At the moment we're depending on the queue limits to constrain the number of containers that can run at the same time. This actually isn't terribly good and is fundamentally flawed. Here's why:
It's perfectly possible (and expected by design) for the worker on the queue to time-out because the scraper hasn't written anything out to a log for 5 minutes or so. In this case, the job will go into the retry queue and meanwhile another job will get picked up which likely will start up another container taking it over the limit of running containers (currently supposed to be 20)
So, it would make sense to add an extra check in the RunWorker that checks how many containers associated with runs are currently up and running. If there are more than the limit it should throw an exception so it goes in the retry queue.
At the moment we're depending on the queue limits to constrain the number of containers that can run at the same time. This actually isn't terribly good and is fundamentally flawed. Here's why:
It's perfectly possible (and expected by design) for the worker on the queue to time-out because the scraper hasn't written anything out to a log for 5 minutes or so. In this case, the job will go into the retry queue and meanwhile another job will get picked up which likely will start up another container taking it over the limit of running containers (currently supposed to be 20)
So, it would make sense to add an extra check in the RunWorker that checks how many containers associated with runs are currently up and running. If there are more than the limit it should throw an exception so it goes in the retry queue.