Google Cloud recently rolled out spot VMs. Unlike the existing preemptible machine type, a spot VM has a notionally unlimited lifetime. Pricing for spot VMs is the same as for preemptible VMs, typically about 30% of the cost of a dedicated VM.
Why is this needed:
Imagine running a cluster using relatively expensive machines, like Tau 2D instances. Your pods are part of a big data serving platform with automatic shard management - great! That means you can use ephemeral VMs because the cluster will automatically reflow data away from non-operational nodes, and redistribute when new nodes come up.
Of course, this takes time, and loading the indices to memory also takes more time. The tradeoff is that we can run at a fraction of the cost!!
Overall, this use case is well-served by spot instances. It is poorly served by preemptible instances. When using preemtible VMs, the data shuffle and index build/load requires ~15% of life just for reshuffle data. Spot is more like 4% in practice. Overall impact on query latency exceeds that level of improvement.
Extra info (e.g. existing slack convo link):
The SPOTprovisioning_model is supported in Terraform 4.23 as a beta feature.
(Optional, Beta) Describe the type of preemptible VM. This field accepts the value STANDARD or SPOT. If the value is STANDARD, there will be no discount. If this is set to SPOT, preemptible should be true and auto_restart should be false.
What would you like to be added:
Google Cloud recently rolled out spot VMs. Unlike the existing preemptible machine type, a spot VM has a notionally unlimited lifetime. Pricing for spot VMs is the same as for preemptible VMs, typically about 30% of the cost of a dedicated VM.
Why is this needed:
Imagine running a cluster using relatively expensive machines, like Tau 2D instances. Your pods are part of a big data serving platform with automatic shard management - great! That means you can use ephemeral VMs because the cluster will automatically reflow data away from non-operational nodes, and redistribute when new nodes come up.
Of course, this takes time, and loading the indices to memory also takes more time. The tradeoff is that we can run at a fraction of the cost!!
Overall, this use case is well-served by spot instances. It is poorly served by preemptible instances. When using preemtible VMs, the data shuffle and index build/load requires ~15% of life just for reshuffle data. Spot is more like 4% in practice. Overall impact on query latency exceeds that level of improvement.
Extra info (e.g. existing slack convo link):
The
SPOT
provisioning_model
is supported in Terraform 4.23 as a beta feature.https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#provisioning_model
Slack link