prominence-eosc / prominence

PROMINENCE server
Apache License 2.0
2 stars 0 forks source link

Support quotas when multiple groups use the same OpenStack project & opportunistic use #118

Open alahiff opened 4 years ago

alahiff commented 4 years ago

Currently we have two types of allocations in IRIS:

In both cases, we need to support the following:

For the last 2 points above, the hard part is deciding what to do if the real group needs their resources back. One option is for opportunistic usage to be only for preemptable jobs. This would allow the real group to get access to their resources quickly.

Currently, when there is a single OpenStack project per group, only that group can use the resources unless the JSON config in IMC is updated to include other groups. And in this case the access to resources is on first-come first-served basis. Similarly, when a single OpenStack project is shared between multiple groups, access to resources is on first-come first-served basis.

alahiff commented 4 years ago

Two options:

  1. Since cloud_hook_translate_job.py is what triggers the provisioning of resources, it will need to know about the group's current usage and quota and decide whether or not to provision resources. This may be difficult because jobs may need to either be routed multiple times or we need to prevent routing completely if a group has used it's allocation.

  2. Could stop using the job router completely, and have a service which monitors the status of jobs. It can then deploy infrastructure for idle jobs as necessary, and delete them when it sees the jobs have completed. Could actually use the job router hooks to generate events to avoid polling. This new service would be an intermediate layer between the job router hooks and IMC. It would actually be quite straightforward, and be useful for larger scales.

The hard bit is what to do in IMC. One option - we enable all groups on all clouds, but use requirements & preferences to direct jobs to appropriate resources. However, we need to be careful that we don't drift away from the philosophy of PROMINENCE too much, where deployment will be tried across multiple clouds until one is found which actually works.