fabric-net-svpc-access uses authoritative iam_binding

ideasculptor commented 4 years ago

The module surely needs to use iam_member instead of iam_binding, or provide the same 'additive' option that the iam modules provide - easily done by simply using the subnets_iam module (https://github.com/terraform-google-modules/terraform-google-iam/tree/master/modules/subnets_iam )

If you use the module as it is, you are pretty much guaranteed to break stuff

ludoo commented 4 years ago

I'm not convinced we should change it, as this is the right place to handle subnet IAM bindings, and they then need to be authoritative to ovverride any potential drift. Keep in mind this is a "fabric" module, and while it can definitely be used standalone, it's meant to be composed with others and "own" its tiny piece of infra.

ludoo commented 4 years ago

If you feel strongly about this, please bring actual use cases, as we've been using this module in actual code and it works well for our use case.

ideasculptor commented 4 years ago

My use case is pretty simple - I have more than one team responsible for networks. Each team wants to be able to manage their subnets, routes, and firewall rules separately. The team responsible for GKE subnets don't want to have to deal with subnets set up for GCE resources that are independent of the kubernetes networks - communication between k8s and other GCE resources is all pubsub so direct communication isn't necessary and defense-in-depth dictates keeping those subnets separate.

Because subnets are defined in separate templates, host project assignment and access controls for those subnets also happens separately for the service project for each. The GKE subnets are defined by a template managed by the k8s team, and the GKE service project is attached to the environment host network/project separately from the cloud-sql subnets and the cloud-sql service project (for example). But because the module uses an authoritative binding, I cannot use it to do that. I must attach all service projects to the host project at the same time, which also means all subnets must be defined at the same time, so that they are available when the module gets called

This isn't the production code, but a testbed that prototypes the basic structure in simpler form, as PoC (and I stripped the use of this module out of it this afternoon, in favour of a PR which makes the shared_vpc functionality in project-factory modules more flexible):

https://github.com/ideasculptor/gcloud-reference-arch/tree/master/Reference%20Infrastructure/dev/us-central1/

If you look inside the 'dev' subdir of that directory, which contains resources in the dev environment (which hosts the dev network), you can see public_subnets and backend_subnets defined separately. The dev network and dev-specific service project are defined in https://github.com/ideasculptor/gcloud-reference-arch/tree/master/Reference%20Infrastructure/dev/environment and https://github.com/ideasculptor/gcloud-reference-arch/tree/master/Reference%20Infrastructure/dev/service-project. A shared service-project intended for resources that can run in any environment is defined and attached to the dev network in https://github.com/ideasculptor/gcloud-reference-arch/tree/master/Reference%20Infrastructure/shared/service-project

If you look inside the shared_svc subdir of that first link, which contains resources assigned to a shared service project that can run resources in any of the 3 environments, you can see the bastion and gke templates are separated, and each only needs to access the remote_state for the subnets that they run within. There is also a dev-svc service project, which currently has no resources, but which will eventually have environment-specific resources within it. There is no reason for both of those service projects to have to be declared such that they can both be passed to the same module instance since they are managed completely independently. Users that can modify one don't even necessarily have permissions to modify the other within our infrastructure tooling, regardless of the gcloud roles they are able to acquire, which likely would allow them to run either template.

You'll find the actual terraform code, which underlies the configs, in https://github.com/ideasculptor/gcloud-templates - the URLs are in the terragrunt.hcl files, along with variable values. Other variables are defined in .yaml files in the directory tree, to allow values to be inherited just by placing a terragrunt.hcl file in a particular location in the directory tree, mostly used to specify paths to remote state files.

ludoo commented 4 years ago

I got more than I bargained for. :) Let me digest this and get back to you.

ideasculptor commented 4 years ago

I rewrote it. Probably made it longer, but pointed my explanation more directly at your question

ideasculptor commented 4 years ago

The long and the short of it is that an infrastructure for an enterprise with thousands of developers working on hundreds of services and applications is never going to want to define all networking components in a single module call. Arguably, even a company with 100 developers would want more granularity than that, in all likelihood. And I speak from experience on that point.

If a new project launches, I want to carve them out an address space and let that team define what their network looks like, and I certain'y don't want to risk perturbing existing networks by updating a single authoritative network association module. There's just no need to have to pull that all back to a single central call with dozens of other projects to wire them all together at once. It's really the same problem in the infrastructure more generally as it is in the network. I don't want an authoritative host network/subnet association call for all service projects. I want an additive one that lets me treat each service project separately.

ludoo commented 4 years ago

Ok it's clear. All my customers tend to define networking in a single point, subject to the same review process, regardless of which team actually manages individual resources (applied to your case, the GKE people would manage their subnet configurations, but they would reside in the same root module and go through the same approval process).

TL;DR: your use case never came to mind, but is of course perfectly valid.

I would still like to preserve a way of using authoritative bindings, so would an extra use_authoritative_iam_bindings bool variable in the module with a true default, used to decide which type of bindings to apply, and of course an extra non-authoritative resource, cover your use case?

And thanks a lot for bringing this to our attention, of course. Super useful and interesting.

ideasculptor commented 4 years ago

I would solve it by using the fabric iam modules, which already allow you to choose between additive and authoritative modes, and there's already a module for subnet iam. You can simply switch to calling that module and pass the type flag through, and it all should 'just work'

ideasculptor commented 4 years ago

https://github.com/terraform-google-modules/terraform-google-iam/tree/master/modules/subnets_iam

ludoo commented 4 years ago

In reply to your last comment, I'd argue that a large infra is composed of many environments, and many different networks, so you'd have many single touchpoints. But that's immaterial, let's focus on how to address your perfectly valid use case.

I don't get the reference to the IAM module. If you prefer using that, then just create your own module to define project association to the SVPC, and handle the rest in the IAM module. Or am I missing something?

ludoo commented 4 years ago

To put my comment above in the proper context: this submodule (and the overall fabric approach) is not to be the single solution encompassing all possible approaches to handling SVPC access, but a convenience module to use as a sensible shortcut while prototyping, and a starting point to fork and evolve when adaptation to specific use cases is needed.

So if the IAM module solves 90% of your needs, I don't think we should change anything here.

ideasculptor commented 4 years ago

It just seems like a very fast way to replace that single resource with a single call to an already existing (and tested, presumably) module, in order to get exactly the functionality desired - a way to assign the basic IAM roles to the various service accounts and identities that need them in order to set up access to a host network. It is just code re-use, nothing more. No need to re-implement and re-test it in the fabric-ne-svpc-access module when you can just call the other module, insyead.

The only other way I could think of to really solve it would be to use separate templates to define the subnet maps and lists and store them in remote state, then use the module in authoritative mode to pull them all into the single module call - but writing unit and integration tests for that would be such a headache. In the end, using an additive mode makes testing massively easier.

ideasculptor commented 4 years ago

And why make public modules if they aren't intended to be reused? If they are only there to serve as a copy/paste template or as the root of a fork that never merges back, you lose pretty much all of the benefit of open-source and community effort. That's what causes open source efforts to wallow and never gain widespread adoption.

It seems far superior, to me, to construct modules that are sufficiently general purpose that they can be used without modification, so that users automatically get the benefit of future bug fixes, future features, and the efforts of all the other users.

ludoo commented 4 years ago

Our -- CFT Fabric -- approach (which has been tested in several very large migrations for 1.5 years), is to minimize complexity and use simple modules as a starting point, and specialize them for the specific use case. Which also works very well for the prototyping use case, which is something my team does a lot (how would we be able to automate testing of new features and configurations otherwise?).

Just look at the host of issues in the project-factory and networking modules lately, where complexity is starting to rear its head. Again, if 90% of your use case is solved by the IAM module, I don't think we need to change anything here. If on the other hand you like having everything related to SVPC access encapsulated in a single module, and your use case mandates non-authoritative IAM resources, let's discuss how to account for that in this submodule.

morgante commented 4 years ago

And why make public modules if they aren't intended to be reused?

I want to be clear that all our main modules are meant to be reused. CFT Fabric (which ludoo@ maintains) offers an opinionated subset which don't necessarily follow the same philosophy.

ludoo commented 4 years ago

I'm going to close this, feel free to reopen if you need us to implement simple non-authoritative bindings for subnets alongside the authoritative ones.

terraform-google-modules / terraform-google-network

fabric-net-svpc-access uses authoritative iam_binding #98