We're seeing some issues with QCG-PilotJob on very large machines, and in general given the heterogeneity of the HPC landscape it's probably good to have more than one way of starting and monitoring processes to maximise our chances of success.
We've been working with the authors of RADICAL-Pilot lately, and they are adding some features to it to make it more suitable for use as an instantiator for MUSCLE3. Let's add it as an optional second backend.
[x] Add an integration test that uses a simulated SLURM cluster of Docker containers
[ ] Add an RPInstantiator
[x] Scan the environment and determines what resources we have
[ ] Get an RP Pilot using those
[ ] Start instances using the pilot
[ ] Monitor execution and shut down correctly at simulation end or crash
[ ] Add a command line option to the manager to select the backend
We're seeing some issues with QCG-PilotJob on very large machines, and in general given the heterogeneity of the HPC landscape it's probably good to have more than one way of starting and monitoring processes to maximise our chances of success.
We've been working with the authors of RADICAL-Pilot lately, and they are adding some features to it to make it more suitable for use as an instantiator for MUSCLE3. Let's add it as an optional second backend.