Support quotas when multiple groups use the same OpenStack project & opportunistic use

prominence-eosc / prominence

PROMINENCE server

Apache License 2.0

2 stars 0 forks source link

Currently we have two types of allocations in IRIS:

a single OpenStack project per group
a single OpenStack project for CCFE which needs to be allocated to multiple groups

In both cases, we need to support the following:

when a single OpenStack project is used for multiple groups, it should be possible for each group to be limited to their quota
when a single OpenStack project is used for multiple groups, it should be possible to one group to make opportunistic usage of another group's unused resources (optionally)
when a single OpenStack project is used per group, it should be possible for one group to make opportunistic usage of another group's unsed resources (optionally)

For the last 2 points above, the hard part is deciding what to do if the real group needs their resources back. One option is for opportunistic usage to be only for preemptable jobs. This would allow the real group to get access to their resources quickly.

Currently, when there is a single OpenStack project per group, only that group can use the resources unless the JSON config in IMC is updated to include other groups. And in this case the access to resources is on first-come first-served basis. Similarly, when a single OpenStack project is shared between multiple groups, access to resources is on first-come first-served basis.

Two options:

Since cloud_hook_translate_job.py is what triggers the provisioning of resources, it will need to know about the group's current usage and quota and decide whether or not to provision resources. This may be difficult because jobs may need to either be routed multiple times or we need to prevent routing completely if a group has used it's allocation.
Could stop using the job router completely, and have a service which monitors the status of jobs. It can then deploy infrastructure for idle jobs as necessary, and delete them when it sees the jobs have completed. Could actually use the job router hooks to generate events to avoid polling. This new service would be an intermediate layer between the job router hooks and IMC. It would actually be quite straightforward, and be useful for larger scales.

The hard bit is what to do in IMC. One option - we enable all groups on all clouds, but use requirements & preferences to direct jobs to appropriate resources. However, we need to be careful that we don't drift away from the philosophy of PROMINENCE too much, where deployment will be tried across multiple clouds until one is found which actually works.

prominence-eosc / prominence

Support quotas when multiple groups use the same OpenStack project & opportunistic use #118