iterative / terraform-provider-iterative

☁️ Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes
https://registry.terraform.io/providers/iterative/iterative/latest/docs
Apache License 2.0
290 stars 27 forks source link

`runner` gcp firewall quota #604

Open dacbd opened 2 years ago

dacbd commented 2 years ago

This is could be applicable for the other providers but using gcp I received a "Quota 'FIREWALLS' exceeded. Limit: 100.0 globally." Should we query the API for an existing iterative / cml-qwerty ingress/egress rules to reuse before creating new ones?

ref; https://github.com/iterative/terraform-provider-iterative/blob/master/iterative/gcp/provider.go#L142

perhaps not prefixing with the instance name but something more generic can be the fix?

0x2b3bfa0 commented 2 years ago

Duplicate of / related to: #289

dacbd commented 2 years ago

Duplicate of / related to: #289

Different but related

0x2b3bfa0 commented 2 years ago

Potentially related to https://github.com/iterative/terraform-provider-iterative/pull/156#discussion_r668236748

dacbd commented 2 years ago

@0x2b3bfa0 what do you think of:

perhaps not prefixing with the instance name but something more generic can be the fix?

0x2b3bfa0 commented 2 years ago

Does that imply that the first run would create a firewall with a generic name (e.g. cml) and susequent runs would just reuse it? It's similar to what we do on AWS (create & reuse a security group) but it's rather controversial for sane use cases.

It would be creating resources and leaving them unmanaged; the only way of deleting them is knowing what to delete and ClickOps or CLIOps.

Anyway, yes, it's possible, and no worse than what we already have on AWS.

0x2b3bfa0 commented 2 years ago

Sorry for the late reply, my radar was stuck. 📡

dacbd commented 2 years ago

it's rather controversial for sane use cases.

can you elaborate? maybe its time for a more agnostic aws-security-group?

It would be creating resources and leaving them unmanaged;

It's already doing that, but this would make less of them.

0x2b3bfa0 commented 2 years ago

can you elaborate?

Sure! I mean that we're creating resources that are being left unmanaged: cml runner is both responsible for ephemeral and long-lived resources, but has no means of managing the latter.

maybe it's time for a more agnostic aws-security-group?

Do you mean exposing a way of importing existing resources on all the supported backends? That would be great, in my opinion.

It's already doing that, but this would make less of them.

Yes, it is. 🙈 Although not by design, just because of the bug described on https://github.com/iterative/terraform-provider-iterative/pull/156#discussion_r668236748.

0x2b3bfa0 commented 2 years ago

(Tangential food for thought: while it comes with its own set of issues,[^1] the approach suggested by the autoscaling with self-hosted runners GitHub documentation seems more solid in the long run, although the initial setup requires an extra step[^2])

[^1]: Not all runner solutions remove themselves after they have been deleted, which can be problematic, especially, if combined aith auto-scaling capabilities. [^2]: Id est, either Terraform or “one-click deployment buttons” as provided by Amazon Web Services, Microsoft Azure or Google Cloud.