prominence-eosc / prominence

PROMINENCE server
Apache License 2.0
2 stars 0 forks source link

Allow range requests in resources #156

Closed fcasson closed 2 years ago

fcasson commented 2 years ago

In the context of multi-node MPI jobs and dynamic and transparent hardware provisioning, users should be able to provide ranges for

This should allow resources to be allocated more flexibly when users are unaware of the properties of available resources

The first preference for number or processes would be the maximum of the range. The preference for procs per node is less important, but if one is needed, this could also be the maximum.

Number of nodes requirement can then be inferred by the promience server based on available resources. Memory per node requirement would also need to be inferred or instead provided in terms of memory per process.

alahiff commented 2 years ago

I think adding an optional memoryPerCpu as an alternative to memory will definitely be needed.

The hard part is working out how to extend the existing schema in a sensible way, still working on possibilities for this...

alahiff commented 2 years ago

In my first version of this I've added memoryPerCpu and cpusRange to resources. With this either cpus or cpusRange must be specified. With cpusRange = [16, 32] the job will run using anywhere between 16 and 32 CPU cores, with the largest possible value preferred.

A related change is that the JSON description for a running or complete job now contains provisionedResources in the execution section so users can find out externally what resources the job has, e.g.

      "provisionedResources": {
        "cpus": 2,
        "memory": 4
        "nodes": 1
      },

Within a job as usual the environment variables PROMINENCE_CPUS and PROMINENCE_MEMORY will correctly report the available resources.

This is all not yet available with the production API, but has been successfully tested. The next step is to deal with multi-node jobs.

alahiff commented 2 years ago

Also added a third option for CPUs: cpusOptions, for example with cpusOptions = [14, 28]. WIth this either 14 or 28 CPUs will be used, with 28 preferred. One user was basically doing this manually if a job requesting 28 CPUs was idle for too long.

alahiff commented 2 years ago

Added an optional totalCpusRange to resources. This is an alternative to nodes, allowing users to specify a total number of CPUs but not the number of nodes. Initially it will try to use the maximum number of CPUs with minimum number of nodes, which I think would be the preferred option for most use cases.

To summarize, the resources part of the JSON job description contains:

alahiff commented 2 years ago

The options as mentioned above are available. Late closure.

fcasson commented 2 years ago

Okay thanks. I assume this is currently available only from the API not from CLI (that's fine for us) - some examples / docs on the JSON layout would help.

alahiff commented 2 years ago

It's available from the latest version of the CLI, which is documented. But good point - I'll try to improve the API documentation and make sure there are examples, sometime today...

alahiff commented 2 years ago

Some examples of different resources possibilities are now in the second half of: https://prominence-eosc.github.io/docs/job-description-files#

fcasson commented 2 years ago

Thanks!