opensearch-project / terraform-provider-opensearch

https://registry.terraform.io/providers/opensearch-project/opensearch
Apache License 2.0
74 stars 57 forks source link

added opensearch serverless support #92

Closed vasyaxparfenov closed 10 months ago

vasyaxparfenov commented 11 months ago

Description

Added Opensearch Serverless support by:

  1. Extending getClient func with configration necessary for working with serverless version.
  2. OSVersion and Flavor values are set explicitly since serverless variant returns empty response on ping request.
  3. Added AWSv4 signing client RoundTripper wrapper to support required header.

Issues Resolved

Closes https://github.com/opensearch-project/terraform-provider-opensearch/issues/90

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

vasyaxparfenov commented 11 months ago

I would like to mention that I'm fully aware that dragging hash header creation logic into this solution is a bad idea from maintenance perspective, but, since neither elastic client nor awsv4 signing client don't support this, it seems like the only way besides either creating a set of related PRs in dependant packages repositories or, which would probably be THE BEST WAY, completely switching to native go-opensearch client maintained by opensearch team which seems to support aforementioned requirement by default.

vasyaxparfenov commented 11 months ago

We've tested this changes against our private opensearch serverless collection and it successfully created resources.

vasyaxparfenov commented 11 months ago

I've run tests locally and they all passed. Can we re-try? It seems strange that flavor was set to 0 (Unknown) as it seems like docker hosted opensearch instance returned a valid version info response

vasyaxparfenov commented 11 months ago

@prudhvigodithi could you please approve test run?

prudhvigodithi commented 10 months ago

Thanks for your contribution @vasyaxparfenov.

lcapillera commented 10 months ago

@vasyaxparfenov Thank you for this. Any idea when this could be released?

prudhvigodithi commented 10 months ago

Hey @vasyaxparfenov can you please add some unit tests to the change. Thank you

vasyaxparfenov commented 10 months ago

Thanks for your contribution @vasyaxparfenov.

Done

lcapillera commented 10 months ago

@vasyaxparfenov I encounter an error while using this branch in a project. The creation of an index works fine, but when tags are updated in other resources, the opensearch_index resource throws a HEAD healthcheck failed error.

It looks like there are some scenarios where the setting healthcheck = false is not getting into consideration.

The steps are:

  1. Write terraform code to create an opensearch serverless collection with default tags
  2. Write terraform code to create an index
  3. Run apply
  4. Change some tags
  5. Run apply again

Actual Result:

│ Error: HEAD healthcheck failed: This is usually due to network or permission issues. The underlying error isn't accessible, please debug by disabling healthchecks.
│ 
│   with opensearch_index.events_index,
│   on opensearch.tf line 117, in resource "opensearch_index" "events_index":
│  117: resource "opensearch_index" "events_index" {
│ 

Expected Result:

Update tags and do nothing in relation to the index.

My current configs

Terraform version: 1.6.2 OS: linux hashicorp/aws version: 5.22

provider "opensearch" {
  url         = data.aws_opensearchserverless_collection.example.collection_endpoint
  healthcheck = false
}

resource "opensearch_index" "events_index" {
  name     = "events"
  mappings = file("${path.module}/opensearch_index_mappings.json")
  force_destroy = false
}
vasyaxparfenov commented 10 months ago

@vasyaxparfenov I encounter an error while using this branch in a project. The creation of an index works fine, but when tags are updated in other resources, the opensearch_index resource throws a HEAD healthcheck failed error.

It looks like there are some scenarios where the setting healthcheck = false is not getting into consideration.

The steps are:

  1. Write terraform code to create an opensearch serverless collection with default tags
  2. Write terraform code to create an index
  3. Run apply
  4. Change some tags
  5. Run apply again

Actual Result:

│ Error: HEAD healthcheck failed: This is usually due to network or permission issues. The underlying error isn't accessible, please debug by disabling healthchecks.
│ 
│   with opensearch_index.events_index,
│   on opensearch.tf line 117, in resource "opensearch_index" "events_index":
│  117: resource "opensearch_index" "events_index" {
│ 

Expected Result:

Update tags and do nothing in relation to the index.

My current configs

Terraform version: 1.6.2 OS: linux hashicorp/aws version: 5.22

provider "opensearch" {
  url         = data.aws_opensearchserverless_collection.example.collection_endpoint
  healthcheck = false
}

resource "opensearch_index" "events_index" {
  name     = "events"
  mappings = file("${path.module}/opensearch_index_mappings.json")
  force_destroy = false
}

Thank you for a feedback, @lcapillera. By looking at source code of provider and elasticsearch client used in this solution I've noticed that aforementioned error is returned by provider itself while hiding actual error during client creation process. Could you please enable info level logging and run your setup? We should get a real error which happens during setup rather than generic one.

lcapillera commented 10 months ago

@vasyaxparfenov The client error is no active connection found: no Elasticsearch node available:

opensearch_index.events_index: Refreshing state... [id=events]
2023-10-31T10:37:47.788+0100 [INFO]  provider.terraform-provider-opensearch: 2023/10/31 10:37:47 [INFO] couldn't create client: *errors.withStack, no active connection found: no Elasticsearch node available, *errors.withMessage: timestamp="2023-10-31T10:37:47.787+0100"
2023-10-31T10:37:47.788+0100 [WARN]  unexpected data:
  registry.terraform.io/my-opensearch-project/opensearch:stderr=
  | {"@caller":"/home/leo/.asdf/installs/golang/1.20/packages/pkg/mod/github.com/hashicorp/terraform-plugin-sdk/v2@v2.12.0/internal/logging/helper_schema.go:21","@level":"trace","@message":"Calling downstream","@module":"sdk.helper_schema","@timestamp":"2023-10-31T10:37:47.787720+01:00"}
  | {"@caller":"/home/leo/.asdf/installs/golang/1.20/packages/pkg/mod/github.com/hashicorp/terraform-plugin-sdk/v2@v2.12.0/internal/logging/helper_schema.go:21","@level":"trace","@message":"Called downstream","@module":"sdk.helper_schema","@timestamp":"2023-10-31T10:37:47.787876+01:00"}

2023-10-31T10:37:47.788+0100 [ERROR] vertex "opensearch_index.events_index" error: HEAD healthcheck failed: This is usually due to network or permission issues. The underlying error isn't accessible, please debug by disabling healthchecks.
2023-10-31T10:37:47.788+0100 [ERROR] vertex "opensearch_index.events_index (expand)" error: HEAD healthcheck failed: This is usually due to network or permission issues. The underlying error isn't accessible, please debug by disabling healthchecks.

The OpenSearch Serverless cluster is up and running because I can recreate the index by doing:

  1. DELETE /events
  2. terraform state rm opensearch_index.events_index
  3. Run terraform apply.
  4. The index is created successfully
vasyaxparfenov commented 10 months ago

@vasyaxparfenov The client error is no active connection found: no Elasticsearch node available:

opensearch_index.events_index: Refreshing state... [id=events]
2023-10-31T10:37:47.788+0100 [INFO]  provider.terraform-provider-opensearch: 2023/10/31 10:37:47 [INFO] couldn't create client: *errors.withStack, no active connection found: no Elasticsearch node available, *errors.withMessage: timestamp="2023-10-31T10:37:47.787+0100"
2023-10-31T10:37:47.788+0100 [WARN]  unexpected data:
  registry.terraform.io/my-opensearch-project/opensearch:stderr=
  | {"@caller":"/home/leo/.asdf/installs/golang/1.20/packages/pkg/mod/github.com/hashicorp/terraform-plugin-sdk/v2@v2.12.0/internal/logging/helper_schema.go:21","@level":"trace","@message":"Calling downstream","@module":"sdk.helper_schema","@timestamp":"2023-10-31T10:37:47.787720+01:00"}
  | {"@caller":"/home/leo/.asdf/installs/golang/1.20/packages/pkg/mod/github.com/hashicorp/terraform-plugin-sdk/v2@v2.12.0/internal/logging/helper_schema.go:21","@level":"trace","@message":"Called downstream","@module":"sdk.helper_schema","@timestamp":"2023-10-31T10:37:47.787876+01:00"}

2023-10-31T10:37:47.788+0100 [ERROR] vertex "opensearch_index.events_index" error: HEAD healthcheck failed: This is usually due to network or permission issues. The underlying error isn't accessible, please debug by disabling healthchecks.
2023-10-31T10:37:47.788+0100 [ERROR] vertex "opensearch_index.events_index (expand)" error: HEAD healthcheck failed: This is usually due to network or permission issues. The underlying error isn't accessible, please debug by disabling healthchecks.

The OpenSearch Serverless cluster is up and running because I can recreate the index by doing:

  1. DELETE /events
  2. terraform state rm opensearch_index.events_index
  3. Run terraform apply.
  4. The index is created successfully

Thank you, @lcapillera ! So this happens mainly when terraform performs requests to refresh its state? I made this assumption based on your original scenario and based on a fact you were able to remove resource from state and re-create it. Are you using this exact branch?

lcapillera commented 10 months ago

@vasyaxparfenov It looks like the refreshing state only fails during the apply if there are modification on the tags of any resource. If I change anything other than a tag, there index refreshing works fine.

For instance, if I add this some_tag to the OpenSearch collection, the index refreshing fails:

resource "aws_opensearchserverless_collection" "events_opensearch_collection" {
  name = "example-${var.environment}"
  type = "TIMESERIES"

  tags = {
    some_tag = "some_tag_value"
  }

  depends_on = [
    aws_opensearchserverless_security_policy.events_opensearch_encryption_policy,
    aws_opensearchserverless_security_policy.events_opensearch_network_policy,
    aws_opensearchserverless_access_policy.events_opensearch_access_policy
  ]
}

But the tag change can be on any resource, not related to OpenSearch, for the index refreshing to fail. It's pretty strange. And yes, I'm using this branch. I'll rebuild the provider object again just in case.

lcapillera commented 10 months ago

@vasyaxparfenov I narrowed down the problem a bit more, my current setup is this:


provider "opensearch" {
  url         = data.aws_opensearchserverless_collection.example.collection_endpoint
  healthcheck = false
}

resource "aws_opensearchserverless_collection" "example_opensearch_collection" {
  name = "example"
  type = "TIMESERIES"

  depends_on = [
    aws_opensearchserverless_security_policy.events_opensearch_encryption_policy,
    aws_opensearchserverless_security_policy.events_opensearch_network_policy,
    aws_opensearchserverless_access_policy.events_opensearch_access_policy
  ]
}

data "aws_opensearchserverless_collection" "example" {
  name = "example"
  depends_on = [aws_opensearchserverless_collection.example_opensearch_collection]
}

The index error started to happen when I added depends_on to the data resource, so changing from

data "aws_opensearchserverless_collection" "example" {
  name = "example"
}

to

data "aws_opensearchserverless_collection" "example" {
  name = "example"
  depends_on = [aws_opensearchserverless_collection.example_opensearch_collection]
}

I did that because it wasn't waiting for the collection to be created in the first run, so the provider was getting a null url.

If I remove depends_on = [aws_opensearchserverless_collection.example_opensearch_collection], the index works fine when tags are changed.

So it looks like I have 2 bugs and I can't fix both at the same time :D.

vasyaxparfenov commented 10 months ago

@vasyaxparfenov I narrowed down the problem a bit more, my current setup is this:


provider "opensearch" {
  url         = data.aws_opensearchserverless_collection.example.collection_endpoint
  healthcheck = false
}

resource "aws_opensearchserverless_collection" "example_opensearch_collection" {
  name = "example"
  type = "TIMESERIES"

  depends_on = [
    aws_opensearchserverless_security_policy.events_opensearch_encryption_policy,
    aws_opensearchserverless_security_policy.events_opensearch_network_policy,
    aws_opensearchserverless_access_policy.events_opensearch_access_policy
  ]
}

data "aws_opensearchserverless_collection" "example" {
  name = "example"
  depends_on = [aws_opensearchserverless_collection.example_opensearch_collection]
}

The index error started to happen when I added depends_on to the data resource, so changing from

data "aws_opensearchserverless_collection" "example" {
  name = "example"
}

to

data "aws_opensearchserverless_collection" "example" {
  name = "example"
  depends_on = [aws_opensearchserverless_collection.example_opensearch_collection]
}

I did that because it wasn't waiting for the collection to be created in the first run, so the provider was getting a null url.

If I remove depends_on = [aws_opensearchserverless_collection.example_opensearch_collection], the index works fine when tags are changed.

So it looks like I have 2 bugs and I can't fix both at the same time :D.

Thank you for a detailed explanation. It seems like issue is related to aws provider resources. Did you try using collection_endpoint attribute? It will be same as using data_source from logical perspective, but it might impose better dependency relationship between opensearch provider and collection and help terraform understand your intention better.

vasyaxparfenov commented 10 months ago

@lcapillera I would actually recommend you to use terragrunt to resolve this issue. On our project we use terragrunt and separate opensearch configuration and setup in two different modules with terragrunt explicit dependency which makes this modules run in specific order due to chicken and egg problem. Even despite my suggestion to use direct dependency to resource rather than using data_source, it won't solve a first run problem due to terraform not being able to handle such thing since it needs all provider to be configured before running refresh (unless you run terraform with target argument first time avoiding opensearch provider configuration until collection is created). One of the reasons why your configuration starts to fail after you add depends_on might be related to a lack of terraform knowledge when running refresh of data_source data due to pending update (new tags for example).

lcapillera commented 10 months ago

@vasyaxparfenov Thank you for the links and suggestions. What you said makes sense. But I wonder if the issue I'm facing does occur with OpenSearch Service (as opposed to Serverless), because I don't see any issues in this repo and I guess not everybody uses terragrunt.

vasyaxparfenov commented 10 months ago

@vasyaxparfenov Thank you for the links and suggestions. What you said makes sense. But I wonder if the issue I'm facing does occur with OpenSearch Service (as opposed to Serverless), because I don't see any issues in this repo and I guess not everybody uses terragrunt.

@lcapillera could you please try using attribute from collection resource rather than data_source? Based on what is mentioned here we can't be sure that it provides previously used url to opensearch provider when you change inputs on dependant resource. I'm actually curious what it configure provider with. Did your run with logging enabled had any details regarding provider setup? Maybe if you try log level debug we could get some information. Nevertheless I would first try using resource reference explicitly

lcapillera commented 10 months ago

@vasyaxparfenov I just tried using the attribute from the collection resource and it works in both cases, first run and later updates. The reason I added the data resource was because this wasn't working, but maybe I had something in a inconsistent state while trying different things. Thanks for your time, I'll let you know if I find anything else, but I guess this PR is ready to go 👏

vasyaxparfenov commented 10 months ago

@vasyaxparfenov I just tried using the attribute from the collection resource and it works in both cases, first run and later updates. The reason I added the data resource was because this wasn't working, but maybe I had something in a inconsistent state while trying different things. Thanks for your time, I'll let you know if I find anything else, but I guess this PR is ready to go 👏

@lcapillera I actually have an explanation why it doesn't work when you use data_source. So, it can't determine what attributes data_source is going to have because it depends on collection which is about to change - so it just passes empty string (default value) to provider and golang url.Parse API is ok with empty string being passed and returns url object with default fields. Then when you try to refresh your resource it calls getClient which when creating client checks url, if it doesn't have valid schema it's not added to connection list and then it has logic which checks active connection and that's where it fails because previous step skips empty url object. Mystery solved 🤣

prudhvigodithi commented 10 months ago

Thanks for the feedback and the contribution, I will review this PR soon and lets this merged so that this feature is part of the upcoming release. Thanks

prudhvigodithi commented 10 months ago

Hey @vasyaxparfenov do you think its useful to add anything in the documentation? https://github.com/opensearch-project/terraform-provider-opensearch/blob/main/CONTRIBUTING.md#generate-documentation-using-tfplugindocs

vasyaxparfenov commented 10 months ago

Hey @vasyaxparfenov do you think its useful to add anything in the documentation? https://github.com/opensearch-project/terraform-provider-opensearch/blob/main/CONTRIBUTING.md#generate-documentation-using-tfplugindocs

tfplugindocs generate didn't produce any changes, so I guess there is nothing important to add, the usage pattern hasn't changed so to use serverless variant you would just provide url which normally contains all necessary information

prudhvigodithi commented 10 months ago

Thanks @vasyaxparfenov, please make sure the CI runs are green and we can proceed to merge the PR. Error:

         with opensearch_user.test,
          on terraform_plugin_test.tf line 2, in resource "opensearch_user" "test":
           2: resource "opensearch_user" "test" {
vasyaxparfenov commented 10 months ago

@prudhvigodithi I've been running tests locally and I don't have similar issue. It might be a bug in tests logic since I can't find a logic path which would lead to Flavor being set to 0.

prudhvigodithi commented 10 months ago

LGTM, @vasyaxparfenov, thanks for your contribution.