hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.62k stars 9.01k forks source link

[Bug]: aws_kendra_data_source declared values being reset on apply #33237

Open derekbelrose opened 10 months ago

derekbelrose commented 10 months ago

Terraform Core Version

1.5.6

AWS Provider Version

5.13.0

Affected Resource(s)

Expected Behavior

Here is my declared data source:

resource "aws_kendra_data_source" "aem_pub_datasource" {
  index_id = aws_kendra_index.index.id
  name     = "${var.project_id}-ds-aem-publisher-${random_id.id.hex}"
  type     = "WEBCRAWLER"
  description = "Web crawler for AEM published site (${var.aem_publisher_url})"
  role_arn = aws_iam_role.aem_webcrawler_datasource.arn
        schedule = ""

        tags = {}
  configuration {
    web_crawler_configuration {
                        max_content_size_per_page_in_mega_bytes = 25
      urls {
        seed_url_configuration {
                                        web_crawler_mode = "EVERYTHING"
          seed_urls = [
            var.aem_publisher_url
          ]
        }
      }
    }
  }
}

I expect the above to create the resource and not have to be applied in subsequent plan/apply runs. I, also, expect that I can verify that the settings were applied within the AWS Amazon Kendra Console settings.

Actual Behavior

This creates a few issues when run terraform plan after a recent apply:

Terraform will perform the following actions:

  # aws_kendra_data_source.aem_pub_datasource will be updated in-place
  ~ resource "aws_kendra_data_source" "aem_pub_datasource" {
        id             = "8e7d33da-e4f9-42aa-847c-d12e5f521b65/6c6e80c2-9bf6-42b1-a6e6-0fa453089d5f"
        name           = "djb-build-test-ds-aem-publisher-72b19a7684f4df12"
        tags           = {}
        # (11 unchanged attributes hidden)

      ~ configuration {
          ~ web_crawler_configuration {
              ~ max_content_size_per_page_in_mega_bytes = 0 -> 25
                # (5 unchanged attributes hidden)

                # (1 unchanged block hidden)
            }
        }
    }

Then, after applying the above, it will change 0 to 25, but a second plan/apply repeats the same behavior. The relevant DEBUG info can be found in the section below.

Also, the AWS console interface shows a blank page when trying to view the settings for the data source that the above snippet creates.

The error in the javascript on the console can be found as the first file in the relevant gist as well as a screenshot of the Chrome session with the broken AWS console.

Relevant Error/Panic Output Snippet

2023-08-30T09:20:18.004-0400 [WARN]  Provider "provider[\"registry.terraform.io/hashicorp/aws\"]" produced an unexpected new value for aws_kendra_data_source.aem_pub_datasource, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .updated_at: was cty.StringVal("2023-08-30T13:17:26Z"), but now cty.StringVal("2023-08-30T13:20:17Z")
      - .configuration[0].web_crawler_configuration[0].max_content_size_per_page_in_mega_bytes: was cty.NumberIntVal(25), but now cty.NumberIntVal(0)

Terraform Configuration Files

https://gist.github.com/derekbelrose/5ff38fbfe6ef685642f1cc7fe1e976a2

Steps to Reproduce

TF_LOG=debug AWS_PROFILE=terraform terraform plan -var-file=development.tfvars -out derek.pln 
TF_LOG=debug AWS_PROFILE=terraform terraform apply derek.pln 
# wait
TF_LOG=debug AWS_PROFILE=terraform terraform plan -var-file=development.tfvars -out derek.pln 
TF_LOG=debug AWS_PROFILE=terraform terraform apply derek.pln 
#wait
TF_LOG=debug AWS_PROFILE=terraform terraform plan -var-file=development.tfvars -out derek.pln 
TF_LOG=debug AWS_PROFILE=terraform terraform apply derek.pln 

Debug Output

2023-08-30T09:40:51.213-0400 [WARN]  Provider "provider[\"registry.terraform.io/hashicorp/aws\"]" produced an unexpected new value for aws_kendra_data_source.aem_pub_datasource, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .updated_at: was cty.StringVal("2023-08-30T13:20:17Z"), but now cty.StringVal("2023-08-30T13:40:50Z")
      - .configuration[0].web_crawler_configuration[0].max_content_size_per_page_in_mega_bytes: was cty.NumberIntVal(25), but now cty.NumberIntVal(0)

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 10 months ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

byuniqueman commented 10 months ago

I am having the same exact problem as you as I cannot set max_content_size_per_page_in_mega_bytes.

Also, working with AWS support, we found out the settings tab is showing blank because the console is looking for some values, which are indeed not optional per the docs. AWS said that will be updating these docs. To confirm what does work, you can create a json config file, and create a data source from the AWS CLI.

skeleton.json - https://gist.github.com/byuniqueman/2c007a4880d9ce19bbe5174253f905d4

aws kendra create-data-source --cli-input-json file://skeleton.json

At this point the data source should be viewable in the settings tab

So with the "MaxContentSizePerPageInMegaBytes" broken in Terraform, I am at a standstill in terms of using Terraform to deploy any Kendra data sources at this point, since it's seems to be a requirement currently for the UI.

bhuvanesh19 commented 9 months ago

Could this be because MaxContentSizePerPageInMegaBytes passed to expandWebCrawlerConfiguration is not a valid float and is taking the default float32 value (0.0). :thinking:

byuniqueman commented 9 months ago

I am having the same exact problem as you as I cannot set max_content_size_per_page_in_mega_bytes.

Also, working with AWS support, we found out the settings tab is showing blank because the console is looking for some values, which are indeed not optional per the docs. AWS said that will be updating these docs. To confirm what does work, you can create a json config file, and create a data source from the AWS CLI.

skeleton.json - https://gist.github.com/byuniqueman/2c007a4880d9ce19bbe5174253f905d4

aws kendra create-data-source --cli-input-json file://skeleton.json

At this point the data source should be viewable in the settings tab

So with the "MaxContentSizePerPageInMegaBytes" broken in Terraform, I am at a standstill in terms of using Terraform to deploy any Kendra data sources at this point, since it's seems to be a requirement currently for the UI.

I verified with AWS support that this isn't a issue in V2 of the Webcrawler. There is no plans to fix V1 since it's on a deprecation path, so no effort will be put into that to fix the issue. A deprecation notice for V1 will be released soon.

derekbelrose commented 9 months ago

Does the provider support the v2 webcrawler type? All I see is reference to the WEBCRAWLER type.

derekbelrose commented 8 months ago

@byuniqueman Does your verification have any implication on how I might be able to get this to work at this point in time? I am not aware of any way for me to create V2 TemplateConfiguration style data sources (referenced in #29922).