hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.3k stars 1.73k forks source link

Provide method to plan without credentials #19546

Open wyardley opened 1 week ago

wyardley commented 1 week ago

Community Note

Description

When running terraform plan, the provider throws an error if no credentials are found. Oddly, if I run gcloud auth -revoke, I don't get an error (probably because the creds file still exists), but running in a CI service with no creds at all throws an error.

The credentials are clearly not actually needed, because terraform init and terraform plan both work fine with totally bogus credentials.

The error:

โ”‚ Error: Attempted to load application default credentials since neither `credentials` nor `access_token` was set in the provider block.  No credentials loaded. To use your gcloud credentials, run 'gcloud auth application-default login'

I can work around this in a CI provider by providing the following totally bogus credential, set in the path given by GOOGLE_APPLICATION_CREDENTIALS, lets me init and plan just fine (this is basically how I've worked around this in the path):

{
  "type": "service_account",
  "project_id": "foobar",
  "client_email": "test@foobar.iam.gserviceaccount.com",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://oauth2.googleapis.com/token",
  "universe_domain": "googleapis.com"
}

While I understand that certain operations will require valid credentials, it would be nice if the provider could have some more explicit way to allow operations which don't need credentials to succeed.

New or Affected Resource(s)

n/a

Potential Terraform Configuration

No response

References

https://github.com/hashicorp/terraform/issues/21761

rileykarson commented 3 days ago

@SarahFrench I'd be curious what other providers do/think about this case. I think AWS has some skip flags, but am not sure about the promises about them- what if a resource needs creds one day? I haven't looked at Azure for this.

My current stance is that this happens to work for most resources, but an initialized client is available during plan (specifically CustomizeDiff) and there have been past signals from Core that offline plan can't be relied upon. I can't find a more current stance other than that issue continuing to be open.

If we ever land the changes to the Google APIs to support API-side change validation I would advocate strongly for online validation in this provider. Although we'd likely include an opt-out flag for that.

wyardley commented 3 days ago

@rileykarson yes, interested to hear about how this is handled elsewhere as far as with plans.

I updated the ticket to clarify that init -backend=false does work these days (I think that was fixed ages ago), so the issue is, indeed, with planning, vs. initing as the initial description / summary said.

I get the desire to avoid problems or unexpected results downstream, but I am guessing that any operations that did require auth would still fail if, somehow, provided revoked or invalid credentials?

The clue in the linked issue about using -refresh=false. I'm not sure if this is the default behavior with the test framework with plan only, but I can see if that changes the behavior at all, and it's probably a good thing to have set either way.

wyardley commented 3 days ago

Ok, just to follow up: Setting TF_ARGS_plan to -refresh=false didn't change anything.

I think since my run blocks have

  command = plan
  plan_options {
    refresh = false
  }

it should already be planning that way anyway.

So, not sure if that changes anything, but IMO, seems fair to expect that creds should be less likely to be needed when refresh is disabled...

SarahFrench commented 2 days ago

๐Ÿ‘‹ Hi all, I'll reply in a bit- I've done some research but I'm asking for input from other provider teams before posting on here!

SarahFrench commented 1 day ago

I'm back, and here's my understanding of the issue:


The error originates from the GetCredentials function, which is called during the process of configuring the provider. In the absence of all credentials information the library we use for managing auth attempts to find ADCs and will fail, returning that error. The library only asserts that it can find creds, not that they're valid, so dummy creds can suppress the issue.

In terms of solutions, Terraform core does not tell providers about the context that they're being configured in (apply vs plan. refresh or not) so there's no way to fulfil this use case without adding a provider-level argument similar to skip_credentials_validation in the AWS provider like @rileykarson mentioned. I don't believe there's anything similar in Azure. Our shared opinion is that skipping validating credentials during provider configuration shouldn't be a default behaviour, because it's valuable feedback for users to identify auth issues.

We could add a boolean provider-level argument similar to skip_credentials_validation and use it to either update logic in GetCredentials or to skip invoking the code where we use auth to configure the client during the provider configuration process.

For the end user, they would have to set the new provider argument via a var and use TF_VAR_skip_foobar=true/TF_VAR_skip_foobar=false in different CI environments.

wyardley commented 1 day ago

@SarahFrench thanks so much for the detailed response, and for asking around here ๐Ÿ™. This all tracks with my understanding of things.

I think what Riley was wondering was whether it was even valid at all to want to be able to plan offline, and if future changes planned were likely to affect that.

In the absence of all credentials information the library we use for managing auth attempts to find ADCs and will fail [...] Our shared opinion is that skipping validating credentials during provider configuration shouldn't be a default behaviour, because it's valuable feedback for users to identify auth issues

We could add a boolean provider-level argument similar to skip_credentials_validation and use it to either update logic in GetCredentials or to skip invoking the code where we use auth to configure the client during the provider configuration process.

Yes, I wasn't sure whether the behavior was a side effect of a third party library, or intentional, obviously in the case of the former, bypassing the behavior if it's decided that's a good thing might still be difficult.

I absolutely agree that the current behavior is a good / sensible default, but it would definitely be convenient and useful if there were a way to bypass this behavior using a flag and / or env var, and would be easier for users to find out about (and feel less hacky / fragile) than dropping in fake credentials (I only even considered this because I'd had to use a similar workaround in the past for validate and / or init). There are a lot of cases where setting up a project to auth against (even with limited permissions) seems like overkill, or is otherwise impractical.

Side note: I don't know if they'll eventually move to using terraform based unit tests (whether with the mock provider or also plan-based) now that they're somewhat more possible, but it looks like CFF uses a similar plan-based approach for unit testing: https://github.com/GoogleCloudPlatform/cloud-foundation-fabric/blob/master/FABRIC-AND-CFT.md#key-differences

and that Google mainains the project used for that there: https://pypi.org/project/tftest/

If we ever land the changes to the Google APIs to support API-side change validation

Yes, I can see how that would add some new challenges to this sort of approach, though also some potential benefits. I agree with you that, even with the limitations builtin, it would be nice to be able to still get a limited plan without any auth.

SarahFrench commented 1 day ago

I think what Riley was wondering was whether it was even valid at all to want to be able to plan offline, and if future changes planned were likely to affect that.

When I was discussing this with the core team I was told that planning is always expected to be "online", even if refreshing is disabled. So my impression is that planning offline isn't considered valid and that wouldn't be changed in future.

This is just me talking: From the Terraform core binary's perspective what a provider does to configure itself is a black box, and the expectation is that after configuration logic runs the provider should be fully functional, including interacting with APIs. The configuration logic might contain a mix of things that are more offline (e.g. the default project/region/zone in the Google provider) and more online (credentials/tokens/etc) but the Terraform core binary doesn't have any insight into that.

Having said that, provider development teams do have that insight and can decide whether they want to enable these use cases. A limitation of it being a per-provider decision is that a configuration using multiple providers would need all providers to tolerate a lack of credentials/to work 'offline'. Even if there was a decision to change TF core to support this use case more it would still require provider-side code changes, so the problem is still there to some degree.

As far as this GH issue goes, we're able to add a feature to support this but need to make that decision knowing the above!

rileykarson commented 1 day ago

My only real hesitation with adding a skip option is how it affects our compatibility contract. We could write down "This will allow you to skip providing credentials but not doing so may cause errors because plans are online (i.e. credentialed)" but if there are never errors in practice we run into the observable behaviours becoming the contract of the system.

While a sweeping change to add API-side change validation would begin guarded on a flag that we flip to default upon a major release based on the wide-ranging impact of doing so, what I don't want to promise is that resources with offline plans will never become online plans. For example, we could decide to do google_compute_disk's image family resolution through an online lookup rather than the current offline lookup at-will in theory (although even as I type this, I will admit, I would lean towards gating on a major release based on the age of the specific resource and the fact that I don't know if we even have any online plans...).


Just thinking about whether we could enforce plans are offline if we did want to assert that they always are- a bit of a sidebar from the larger topic- we could probably use a customdiff.All wrapper that passes a copy of the Config object stripped of the client (we still need provider defaults and labels values for sure) into the underlying CustomizeDiff functions. We use customdiff.All across nearly all resources post-5.0.0: https://github.com/hashicorp/terraform-provider-google-beta/blob/main/google-beta/services/compute/resource_compute_address.go#L52-L55

wyardley commented 1 day ago

I can of course see how doing some of this validation online could have some advantage (even with all the various layers of generated code) over having them baked into the provider, though I guess at the cost of less predictability. IAM roles are a huge example of where the current behavior / provider level validation is suboptimal, and where some kind of better validation would be a huge win.

Just thinking out loud a little -- while I know some things may be project, region, or customer specific, and that Google has to worry about things like quotas, unauthorized access, etc., I wonder if there could be a middle ground where there's some way to provide semi-anonymous (but volume-limited per source IP) or heavily sandboxed access for this sort of use?

Worst case scenario, it's not that hard to create a heavily sandboxed project and a limited service account for this use; I still think there's value in having a way to not use credentials at this point where they don't seem to be used for the plan -refresh=false operation.