We can reuse an AWS instance instead of shutting them down after each job.
At least during a single benchmark, we could limit #instances = #parallel jobs.
I am not sure how this would actually work, in principle I see there is a speed up:
only have to create the instance once
only have to download a dataset once (if subsequent jobs use the same dataset)
only have to install the automl framework once (if subsequent jobs use the same framework)
I also see some pitfalls:
When re-using an instance for multiple frameworks, the second framework might be affected by leftovers of the installation of the first.
To provide the same disk space, does this mean we have to make sure to transfer/delete all (cache) files of the previous run?
Does this require additional communication? The most naive way I can imagine of doing this is to just run all folds of a (framework, task) tuple per instance, which should not require additional communication (though some extra clean up, see above). This wouldn't allow perfect re-use but it's probably pretty good.
(for the future:) Is there an increased risk of interruption when using spot instances? Is the effect greater? I assume the answer is no to both (afaik interruption is just based on bid-price, and when transferring results between each part of the job (e.g. fold), no more than one is lost), but I am not sure.
@PGijsbers yes, this is an old suggestion, and I agree with your pitfalls, so I would not consider this as a priority.
Good to keep this for reference though.
This is a suggestion I found in the TODO:
We can reuse an AWS instance instead of shutting them down after each job. At least during a single benchmark, we could limit #instances = #parallel jobs.
I am not sure how this would actually work, in principle I see there is a speed up:
I also see some pitfalls: