Currently when job fail, s3 job will keep moving forward.
The failed job will be shown in bull board. There's retry-all button. If we retry supervisor job, it will be queued again and run.
However, in what order will these retry jobs be inserted into the queue? Likely at last.
However, these manuall-retrying jobs are not managed by s3 job anymore. The S3 job Promise.all will not wait for these jobs. These jobs can still run, until the original s3 job finalized, and start scaling down.
Yes, when s3 scales down selenium stack, these jobs will fail becuase their java scraper are destroyed. They'll probably stay up for a while, waiting for scraper to report progress. And then die out by timeout.
Ideas
There is no way to keep nodes for these siloed jobs from s3 job. But one way is to let express server manage node scale up and down.
[ ] For example, the s3 still do its auto scale up - that's fine. But when s3 Entire session finished, don't scale down.
[ ] the express server can monitor: when there's no s3 job && no supervisor job, scale down. Check this periodically.
[ ] We may want to consider the case where we just want to manually order supervisor job. Like order command rrr from slack channel. Perhaps we can scale up with just node pool of size 1 for this kind of work. But we need to find a way to make this play well with the s3 large-scale provisioning.
Solved by waiting and polling any supervisor job in progress, and once concurrency falls back to 0, the s3 finalizer scales down all nodes. Meanwhile #76 issue seems no longer occur as well.
Currently when job fail, s3 job will keep moving forward.
The failed job will be shown in bull board. There's retry-all button. If we retry supervisor job, it will be queued again and run.
However, in what order will these retry jobs be inserted into the queue? Likely at last.
However, these manuall-retrying jobs are not managed by s3 job anymore. The S3 job Promise.all will not wait for these jobs. These jobs can still run, until the original s3 job finalized, and start scaling down.
Yes, when s3 scales down selenium stack, these jobs will fail becuase their java scraper are destroyed. They'll probably stay up for a while, waiting for scraper to report progress. And then die out by timeout.
Ideas
rrr
from slack channel. Perhaps we can scale up with just node pool of size 1 for this kind of work. But we need to find a way to make this play well with the s3 large-scale provisioning.