iterative / cml

♾️ CML - Continuous Machine Learning | CI/CD for ML
http://cml.dev
Apache License 2.0
4.04k stars 339 forks source link

`cml runner`: Request spot instances from requirements #1101

Open courentin opened 2 years ago

courentin commented 2 years ago

What?

Would it be possible to add the ability to request spot instances from a list of requirements rather than an instance type or a GPU type?

For example, I would like to tell cml runner, I want an instance at the lowest price that:

(more context: discord#cml/1000042237830373406)

Why?

Spot instances are not available 100% of the time and as explained in the aws best practices guide, the less constraints, the more chance we have to fulfil our spot instance request.

Possible solutions

I think we have multiple way of implementing it.

The first and low cost solution would be to allow multiple value for the --cloud-type option:

cml runner
  --cloud-spot
  --cloud-type=g3.4xlarge,g4dn.xlarge,g5.8xlarge

The requirements to instance type conversion would need to be done beforehand. But after all, instance types don't change often.


The second solution would be to implement all the requirement logic into cml runner. Not sure what the api could look like but something like this could be useful:

cml runner
  --cloud-spot
  --cloud-spot-requirement="AcceleratorCount>=1"
  --cloud-spot-requirement="AcceleratorManufacturers=NVIDIA"
  ...

Third solution (basically the second one but probably easier to implement):

{
      "AcceleratorCount": {
          "Min": 1
      },
      "AcceleratorManufacturers": [
          "nvidia"
      ]
}
cml runner
  --cloud-spot
  --cloud-spot-json-requirements=path_to_requirements.json
  ...
0x2b3bfa0 commented 2 years ago

See also

dacbd commented 2 years ago

@courentin what are your thoughts on providing a list to --cloud-type when --cloud-spot is active sequentially address the instance types for the first one that is immediately available. (I haven't researched to see if all the providers have some form of requirements spec API like the one @0x2b3bfa0 linked for AWS)

courentin commented 2 years ago

@dacbd it would be very useful

omesser commented 2 years ago

Thanks for raising this @courentin . I think this is very important for viable spot and even on demand GPU instances allocation in the "wild". My thoughts about implementation/ux - options:

0x2b3bfa0 commented 2 years ago