splunk / terraform-provider-splunk

Terraform Provider for Splunk
Mozilla Public License 2.0
103 stars 75 forks source link

Error: "Unable to find resource" when creating a new saved search #98

Open billycn20 opened 2 years ago

billycn20 commented 2 years ago

Have spent a few weeks trying to understand why a subset of my saves search resources are unsuccessful in being created. This does not happen to all of my alerts, only a subset. All of my alerts are going through the same reusable module to create the resource, so i would expect that it would fail on all of them but that is not the case. I'm guessing whats happening is that there is an error during creating the alert (even though my TRACE logs are not showing an error) and that creates a downstream problem for terraform and it results in the following error:

│ Error: Unable to find resource: [TF-Local] SEC_NEIGHBORINGSTATE_HIGH_ALERT
│
│   with module.dt_eoc_splunk_alerts_migration["SEC_NEIGHBORINGSTATE_HIGH_ALERT"].splunk_saved_searches.saved_search,
│   on modules/splunk_saved_search_alert/main.tf line 45, in resource "splunk_saved_searches" "saved_search":
│   45: resource "splunk_saved_searches" "saved_search" {

this results in the resource being marked as tainted in the state file and makes it difficult to execute any future plan/apply on this config until the failed resources are untainted and resolved manually.

Extra note: the only WARN i get produced from the configuration is the following, which i am unsure if this is a problem or not. [WARN] ValidateProviderConfig from "provider[\"registry.terraform.io/splunk/splunk\"]" changed the config value, but that value is unused

kpingka commented 2 years ago

@billycn20 seeing as Splunk hasn't responded on this, did you find a solution by yourself?

billycn20 commented 2 years ago

@billycn20 seeing as Splunk hasn't responded on this, did you find a solution by yourself?

Yes, this was due to a failure in creating the resource. because splunk provider didn't log the failed create event, it bubbled up the error to the providers savedSearchesRead() method which gets called at the end of the createSavedSearch method.

i have created a PR against this provider to add better logging in this event for the next person. Log http response status and body for create and delete saved searches when DEBUG is on #99 not sure how to get an admin to review the PR though.

kpingka commented 2 years ago

@billycn20 seems to me this has just been abandoned since there hasn't been any activity in this repo for the past 4 months.

I'll give your trunk a try and see if i can figure out my rootcause.

kpingka commented 2 years ago

@billycn20 found my issue, I was using dispatch_earliest_time='rt-15m'

RT implies realtime search which wasn't allowed. TF was failing without logging this error.

Weird thing is I could still create RT alerts using curl, so we're now investigating our permissions structure.

billycn20 commented 2 years ago

@billycn20 found my issue, I was using dispatch_earliest_time='rt-15m'

RT implies realtime search which wasn't allowed. TF was failing without logging this error.

Weird thing is I could still create RT alerts using curl, so we're now investigating our permissions structure.

good to know you found your issue. curious, did my PR/trunk help you identify this problem ?

kpingka commented 2 years ago

Oh yeah definetly, I took your logging and just applied it in a few other places too.

The CreateSavedSearch Response clearly stated that I was lacking permissions.

micahkemp-splunk commented 2 years ago

There are two common issues in the current version of the provider:

These two in combination lead to what you see, where it looks like the resource was initially created, but subsequent runs of Terraform fail because the remote resource isn't actually there.

I've been working on a general fix for this by trying to formalize a Splunk SDK that this provider can use, but I have no timeline as to when (or even if) it will be available.

billycn20 commented 2 years ago

There are two common issues in the current version of the provider:

  • Create failures aren't recognized as failures (due to lack of checking the response code that comes back)
  • Read failures return errors, instead of marking the resource as no longer present

These two in combination lead to what you see, where it looks like the resource was initially created, but subsequent runs of Terraform fail because the remote resource isn't actually there.

I've been working on a general fix for this by trying to formalize a Splunk SDK that this provider can use, but I have no timeline as to when (or even if) it will be available.

i agree with your two points of what the issue is. but if we have no timeline for when the real fix would be made, the logging i have added in #99 would save a lot of developers time by getting the error response logged back rather than being swallowed silently by the provider.