pulumi / pulumi

Pulumi - Infrastructure as Code in any programming language 🚀
https://www.pulumi.com
Apache License 2.0
21.92k stars 1.13k forks source link

Allow applications to handle "not found" errors from `<resource>.get()` #3364

Open clstokes opened 5 years ago

clstokes commented 5 years ago

Problem Scenario

It can be common to need to query for an existing resource and if it doesn't exist then take some other action. Currently Pulumi's <resource>.get() methods halt Pulumi execution without throwing an error that the user/application can catch and react to.

Suggestion

The results or error from <resource>.get() calls and data sources should be catchable so that users can take other actions as a result.

clstokes commented 4 years ago

Updated to include "and data sources" in the suggestion.

lukehoban commented 4 years ago

Updated to include "and data sources" in the suggestion.

For invokes ("data sources"), you can do this today:

aws.getAmi({
    owners: ["153052954103"],
}, { async: true })
.then(
    res => console.log(`RESULT: ${res}`), 
    err => console.error(`ERROR: ${err}`)
);

That is, the error is bubbled up through the promise error handler (and as an exception in async/await code).

Another example:

async function getAmi(): Promise<string> {
    try {
        const amiDetails = await aws.getAmi({
            owners: ["153052954103"],
        }, { async: true });
        return amiDetails.id;
    } catch (err) {
        return "ami-014a2e30da708ee8b";
    }
}

This is not true for .get() calls though (which is what I will continue to track with this issue).

lukehoban commented 4 years ago

Currently Pulumi's .get() methods halt Pulumi execution without throwing an error that the user/application can catch and react to.

This is true, and is deeply intertwined with the fact that these .get() calls (ReadResource in the engine and RPC) are serialized into the checkpoint file. We treat them like all other CRUD operations on resources, and handle failures directly inside the engine, coordinating those failures with checkpointing of state.

It is likely that we do want to provide more user-program control over this, but it'll require a non-trivial update to make this happen. Given that, we do not expect it to be part of 2.0, but will look at this as an opt-in thing post-2.0 and possibly a change of defaults for 3.0.

VikramVasudevan commented 3 years ago

+1 any updates on this one? Really need the get to be "catchable". The scenario I want to accomplish is as follows

742617000027 commented 3 years ago

+1

Scenario: run a Lambda Function to create schema on an RDS Cluster only if the Cluster hasn't existed before.

ValkyrieUK commented 3 years ago

Expereincing the same issue. Would like to create an AWS Config recorder (limited to 1 per account per region) and want to check if one already exists before attempting to create one.

const recorder = aws.cfg.Recorder.get('recorder', 'default'); // Halts Pulumi if it does not exist and unable to catch the error.
thomaszooman commented 3 years ago

+1, please add error handling in get(...) or resource.exists(...)

NapalmCodes commented 3 years ago

Same situation with aws.iot.ThingType.get(). I need to be able to verify it exists before creating it.

Matthew-2021 commented 2 years ago

Same issue here with getting custom domains registered to a CDN endpoint

wrbbz commented 2 years ago

Same issue here with an attempt to create k8s namespace.

Is there any progress on this one?

Danielkiss9 commented 2 years ago

same issue while trying to check if an iam policy already exists. any progress?

t0yv0 commented 2 years ago

Found this documentation reference for <resource>.get(): https://www.pulumi.com/docs/intro/concepts/resources/get/

Would it be possible to add <resource>.exists() with a similar signature to get() with the following constraints:

The user code would then be able to express patterns like this:

if RDSCluster.exists(...) {
   cluster = RDSCluster.get(...)
} else {
   cluster = new RDSCluster(...)
   createSchema(cluster.getConn())
}

This might cause some confusion as to how the program behaves in preview, specifically:

tejaswinibussety commented 2 years ago

+1 any updates on this one?

johan-vromo commented 2 years ago

+1 any updates?

starleaffff commented 2 years ago

+1 for this.

miabot-11 commented 2 years ago

+1

Any updates on this one? I am having a similar issue with getPrivateZoneOutput()

t0yv0 commented 2 years ago

After an investigation in #10883 and discussing the implications with @Frassle we are leaning to close this issue and recommend other Pulumi mechanisms for the use cases here:

  1. A lot of the readers coming here are interested in handling the error from get to create a resource if it does not already exist. #3388 will provide a much better way to address this scenario.

  2. There is a workaround available now before #3388 ships. Providers expose data sources allowing the program to query for existence of resources in the cloud. For an example see getSecurityGroups in AWS. The advantage of these compared to Resource.get() is that they do not interfere with Pulumi state.

  3. For use cases that need to run some logic when a resource actually gets created, lifecycle hooks (specifically create hooks) as specified in #1691 seem to be the preferable mechanism.


Technicalities from the #10338 exploration:

t0yv0 commented 2 years ago

One more possible workaround worth mentioning for scenarios that involve managing a shared "one per account" resource such as AWS Config recorder is to use Stack References. The solution is to provision the Config Recorder in a dedicated Pulumi stack, and import it via Stack References into any stack that needs its.

t0yv0 commented 2 years ago

Closing this off for now. Please let us know if you have a concrete use-case scenario we missed in the analysis and we need to reconsider.

1oglop1 commented 1 year ago

I have a use-case to use 2 different get_resource functions. I'd like to have rds database object, however RDS has 2 types cluster (aurora) and instance (classic rds) they both have almost the same methods and properties however AWS API differs with "Cluster" vs "DBinstance" so when querying for the database I use rds.get_cluster and it returned SDK error with the exception type DBClusterNotFoundFault, however some change the exception is different

Exception: invoke of aws:rds/getCluster:getCluster failed: invocation of aws:rds/getCluster:getCluster returned an error: invoking aws:rds/getCluster:getCluster: 1 error occurred:
        * reading RDS Cluster (myrds-db): couldn't find resource

before

# db_params- is an object coming from a StackReference from another stack

try:
    existing_db = rds.get_cluster(
        cluster_identifier=db_params.apply(lambda x: x["db"]["identifier"]),
        opts=pulumi.InvokeOptions(provider=ec1_provider),
    )
except Exception as exc:
    if "DBClusterNotFoundFault" not in str(exc):
        raise exc
    existing_db = rds.get_instance(
        db_instance_identifier=db_params.apply(lambda x: x["db"]["identifier"]),
        opts=pulumi.InvokeOptions(provider=ec1_provider),
    )

after

# db_params- is an object coming from a StackReference from another stack

try:
    existing_db = rds.get_cluster(
        cluster_identifier=db_params.apply(lambda x: x["db"]["identifier"]),
        opts=pulumi.InvokeOptions(provider=ec1_provider),
    )
except Exception as exc:
    if "couldn't find resource" not in str(exc):
        raise exc
    existing_db = rds.get_instance(
        db_instance_identifier=db_params.apply(lambda x: x["db"]["identifier"]),
        opts=pulumi.InvokeOptions(provider=ec1_provider),
    )

It would be great if I could the exception directly instead of except Exception it would be except DBClusterNotFoundFault or something like that

Frassle commented 1 year ago

Give the above & https://github.com/pulumi/pulumi/issues/11275 I think we should reopen this. We'll need to be careful to explain that this can not be used for "try get or create" logic, but it sounds like it has other applications as well.

ahilmathew commented 1 year ago

Is there any way to lookup for the existence of a resource without the function actually throwing an error? For context - I am trying to get getUserAssignedIdentityOutput but it errors out and the error seems to be not catchable. https://www.pulumi.com/registry/packages/azure-native/api-docs/managedidentity/getuserassignedidentity/

dpunkturban commented 1 year ago

+1 for that

taxdorff-daimler commented 1 year ago

Similar issue here with Confluent Users (Can't create an invite if they have an account, can't lookup a user if they don't) need try/catch to function: (ccloud = Pulumi_ConfluentCloud)

        try:
            inv = ccloud.Invitation(resource_name=name, email=config['email'], opts=ResourceOptions(retain_on_delete=True))
            id_val = inv.users.apply(lambda users: users[0].id)
        except:
            user = ccloud.get_user_output(email=config['email'])
            id_val = user.id.apply(lambda id: id)

Either order breaks the flow of the program, stopping further processing

rexebin commented 1 year ago

A workaround would be to use Azure SDK to check if the resource exists before using the Get method.

xfoxfu commented 1 year ago

I have a similar use case which cannot be covered with "create if not exists" pattern.

const helm_postgres = new kubernetes.helm.v3.Release(
  "helm-postgres",
  // ...
  { ignoreChanges: ["checksum"] }
);

const secret_postgres = pulumi.output(
  kubernetes.core.v1.Secret.get("helm-postgres-secret", "postgres/postgres-postgresql")
);

I have a database deployed with helm, and I have to read out the secrets created by the chart to get the connection password (which is then passed to other resources I wanted to create).

SnoozeFreddo commented 10 months ago

+1

alextricity25 commented 9 months ago

I have the same use case as @xfoxfu . My Pulumi program runs the Release resource to install a helm chart which creates a secret that I need to retrieve. I use the Secret.get() function, but now my preview keeps failing because that secret doesn't exist yet. I tried using dependsOn: <myReleaseResource> on that get() function, but no avail.

mikocot commented 8 months ago

After an investigation in #10883 and discussing the implications with @Frassle we are leaning to close this issue and recommend other Pulumi mechanisms for the use cases here:

  1. A lot of the readers coming here are interested in handling the error from get to create a resource if it does not already exist. Support declaring that a resource should be created only if it does not already exist #3388 will provide a much better way to address this scenario.
  2. There is a workaround available now before Support declaring that a resource should be created only if it does not already exist #3388 ships. Providers expose data sources allowing the program to query for existence of resources in the cloud. For an example see getSecurityGroups in AWS. The advantage of these compared to Resource.get() is that they do not interfere with Pulumi state.
  3. For use cases that need to run some logic when a resource actually gets created, lifecycle hooks (specifically create hooks) as specified in Support custom logic in resource lifecycle #1691 seem to be the preferable mechanism.

@t0yv0 @Frassle This is not a complete list unfortunately. For almost a year we're struggling with different instances of timing/consistency issues around cosmos db role assignments, where Pulumi successfully creates required identities but it's not yet propagated within Azure for CosmosDB to find it to performa assignment.

The workaround for that problem that we've been offered by Pulumi was exactly to use one of the above methods to wait until the resource is available and only then let Pulumi continue the deployment. Unfortunately even if we can complete the deployment in one go that way it is always considered a failed deployment as this methods reports errors for some iterations.

Hence, this issue not being solved makes the only workaround for that problem unviable. We were forced to go around pulumi in this case and use ArmClient directly, but one of the reasons we use Pulumi i exactly so we don't need to do that. It's also taken a lot of time to find this underlying issue, as it's simply not consistent nor expectable, and debugging options are limited. Therefore IMO you should rethink correcting it.

marvkis commented 8 months ago

I had a very similar 'user story':

My current workaround - waiting several minutes before attempting to read the resource - is not very cool and often adds unnecessary delay to the enrollment process.

cooervo commented 4 months ago

no fix for this after 5 years?

Frassle commented 4 months ago

We're actively working on it, but it's a subtle problem.

joeduffy commented 2 days ago

I never understood fully why this can't be fixed. I spent a few minutes just now, and though it's tricky, I do think there are a combination of changes we could make to address this:

  1. Enable ReadResource to fail gracefully when a resource is missing. Right now, it just errors out in the step execution, which triggers a failure in the engine. This requires some thought as to the gRPC interface. The following is a clunky targeted change, but it works:

    diff --git a/pkg/resource/deploy/source_eval.go b/pkg/resource/deploy/source_eval.go
    index 9e890bdf7..e4025cdd8 100644
    --- a/pkg/resource/deploy/source_eval.go
    +++ b/pkg/resource/deploy/source_eval.go
    @@ -1381,8 +1381,14 @@ func (rm *resmon) ReadResource(ctx context.Context,
            logging.V(5).Infof("ResourceMonitor.ReadResource operation canceled, name=%s", name)
            return nil, rpcerror.New(codes.Unavailable, "resource monitor shut down while waiting on step's done channel")
        }
    -
        contract.Assertf(result != nil, "ReadResource operation returned a nil result")
    +
    +   // If the state is nil, which means it wasn't found, return an error in the response.
    +   if result.State.Outputs == nil {
    +       return nil, rpcerror.New(codes.InvalidArgument,
    +           fmt.Sprintf("resource %s of type %s with ID %s was not found", name, t, id))
    +   }
    +
        marshaled, err := plugin.MarshalProperties(result.State.Outputs, plugin.MarshalOptions{
            Label:            label,
            KeepUnknowns:     true,
    @@ -1393,7 +1399,6 @@ func (rm *resmon) ReadResource(ctx context.Context,
        if err != nil {
            return nil, fmt.Errorf("failed to marshal %s return state: %w", result.State.URN, err)
        }
    -
        return &pulumirpc.ReadResourceResponse{
            Urn:        string(result.State.URN),
            Properties: marshaled,
    diff --git a/pkg/resource/deploy/step.go b/pkg/resource/deploy/step.go
    index a01afab1f..d1626dfef 100644
    --- a/pkg/resource/deploy/step.go
    +++ b/pkg/resource/deploy/step.go
    @@ -887,8 +887,10 @@ func (s *ReadStep) Apply() (resource.Status, StepCompleteFunc, error) {
    
            // If there is no such resource, return an error indicating as such.
            if result.Outputs == nil {
    -           return resource.StatusOK, nil, fmt.Errorf("resource '%s' does not exist", id)
    +           s.new.Outputs = nil
    +           return resourceStatus, func() { s.event.Done(&ReadResult{State: s.new}) }, nil
            }
    +
            s.new.Outputs = result.Outputs
    
            if result.ID != "" {

    Most likely we'd want to think about an opt-in to this new failure mode and/or returning something in the response payload rather than abusing codes.InvalidArgument in this way. The engine also needs to "forget" a resource that triggers this behavior.

  2. Adding a new variant of get methods -- call it getAsync -- that returns a promise which breaks when the resource wasn't found. For example, we have get(...): Bucket, and this new method would be getAsync(...): Promise<Bucket>.

  3. That method internally just does what get does today, except that it internally awaits the resource URN's promise().

  4. There were quite a few places in the language SDK where we assume that if a failure occurs in this manner, we want to terminate the program. That's probably an argument for a first class property in the ReadResource result, rather than abusing gRPC errors. We, of course, still want to fail most of the time, but not when getAsync observed and communicated the error. An assortment of edits to the various promise error handling facilities in the TypeScript SDK, at least, was sufficient to get this working.

I was able to hack together a quick proof of concept, and indeed I was able to write this program:

import * as aws from "@pulumi/aws";

// Create an AWS resource (S3 Bucket)
console.log("Before getAsync");
try {
    const bucket = await aws.s3.BucketV2.getAsync("my-bucket", "a-bucket-that-...-doesnt-exist");
    console.log("After getAsync");
} catch (e) {
    console.log("Caught and ignoring: " + e);
}

And have it run w/out failure:

$ pulumi up
Previewing update (dev)

     Type                 Name               Plan       Info
 +   pulumi:pulumi:Stack  catchable-get-dev  create     2 messages

Diagnostics:
  pulumi:pulumi:Stack (catchable-get-dev):
    Before getAsync
    Caught and ignoring: failed to read resource #a-bucket-that-...-doesnt-exist 'my-bucket' [aws:s3/bucketV2:BucketV2]: 3 INVALID_ARGUMENT: resource my-bucket of type aws:s3/bucketV2:BucketV2 with ID a-bucket-that-...-doesnt-exist was not found

Resources:
    + 1 to create

Do you want to perform this update?
> yes
  no
  details

The pattern of create-if-not-exist would presumably want to allocate a "my-bucket" in the catch, but that doesn't work without deeper changes alluded to above, such as forgetting about a ReadResource if it wasn't found during lookup. The code suggests this was the original intent, but it doesn't seem to do so. I assume it is imminently possible.