run-x / opta

The next generation of Infrastructure-as-Code. Work with high-level constructs instead of getting lost in low-level cloud configuration.
https://docs.opta.dev
Apache License 2.0
906 stars 61 forks source link

Support GCP spot machines #905

Open williamflynt opened 2 years ago

williamflynt commented 2 years ago

What would you like to be added:

Google Cloud recently rolled out spot VMs. Unlike the existing preemptible machine type, a spot VM has a notionally unlimited lifetime. Pricing for spot VMs is the same as for preemptible VMs, typically about 30% of the cost of a dedicated VM.

Why is this needed:

Imagine running a cluster using relatively expensive machines, like Tau 2D instances. Your pods are part of a big data serving platform with automatic shard management - great! That means you can use ephemeral VMs because the cluster will automatically reflow data away from non-operational nodes, and redistribute when new nodes come up.

Of course, this takes time, and loading the indices to memory also takes more time. The tradeoff is that we can run at a fraction of the cost!!

Overall, this use case is well-served by spot instances. It is poorly served by preemptible instances. When using preemtible VMs, the data shuffle and index build/load requires ~15% of life just for reshuffle data. Spot is more like 4% in practice. Overall impact on query latency exceeds that level of improvement.

Extra info (e.g. existing slack convo link):

The SPOT provisioning_model is supported in Terraform 4.23 as a beta feature.

(Optional, Beta) Describe the type of preemptible VM. This field accepts the value STANDARD or SPOT. If the value is STANDARD, there will be no discount. If this is set to SPOT, preemptible should be true and auto_restart should be false.

https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#provisioning_model

Slack link