derekbelrose commented 10 months ago

Terraform Core Version

1.5.6

AWS Provider Version

5.13.0

Affected Resource(s)

aws_kendra_data_source

Expected Behavior

Here is my declared data source:

resource "aws_kendra_data_source" "aem_pub_datasource" {
  index_id = aws_kendra_index.index.id
  name     = "${var.project_id}-ds-aem-publisher-${random_id.id.hex}"
  type     = "WEBCRAWLER"
  description = "Web crawler for AEM published site (${var.aem_publisher_url})"
  role_arn = aws_iam_role.aem_webcrawler_datasource.arn
        schedule = ""

        tags = {}
  configuration {
    web_crawler_configuration {
                        max_content_size_per_page_in_mega_bytes = 25
      urls {
        seed_url_configuration {
                                        web_crawler_mode = "EVERYTHING"
          seed_urls = [
            var.aem_publisher_url
          ]
        }
      }
    }
  }
}

I expect the above to create the resource and not have to be applied in subsequent plan/apply runs. I, also, expect that I can verify that the settings were applied within the AWS Amazon Kendra Console settings.

Actual Behavior

This creates a few issues when run terraform plan after a recent apply:

Terraform will perform the following actions:

  # aws_kendra_data_source.aem_pub_datasource will be updated in-place
  ~ resource "aws_kendra_data_source" "aem_pub_datasource" {
        id             = "8e7d33da-e4f9-42aa-847c-d12e5f521b65/6c6e80c2-9bf6-42b1-a6e6-0fa453089d5f"
        name           = "djb-build-test-ds-aem-publisher-72b19a7684f4df12"
        tags           = {}
        # (11 unchanged attributes hidden)

      ~ configuration {
          ~ web_crawler_configuration {
              ~ max_content_size_per_page_in_mega_bytes = 0 -> 25
                # (5 unchanged attributes hidden)

                # (1 unchanged block hidden)
            }
        }
    }

Then, after applying the above, it will change 0 to 25, but a second plan/apply repeats the same behavior. The relevant DEBUG info can be found in the section below.

Also, the AWS console interface shows a blank page when trying to view the settings for the data source that the above snippet creates.

The error in the javascript on the console can be found as the first file in the relevant gist as well as a screenshot of the Chrome session with the broken AWS console.

Relevant Error/Panic Output Snippet

2023-08-30T09:20:18.004-0400 [WARN]  Provider "provider[\"registry.terraform.io/hashicorp/aws\"]" produced an unexpected new value for aws_kendra_data_source.aem_pub_datasource, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .updated_at: was cty.StringVal("2023-08-30T13:17:26Z"), but now cty.StringVal("2023-08-30T13:20:17Z")
      - .configuration[0].web_crawler_configuration[0].max_content_size_per_page_in_mega_bytes: was cty.NumberIntVal(25), but now cty.NumberIntVal(0)

Terraform Configuration Files

https://gist.github.com/derekbelrose/5ff38fbfe6ef685642f1cc7fe1e976a2

Steps to Reproduce

TF_LOG=debug AWS_PROFILE=terraform terraform plan -var-file=development.tfvars -out derek.pln 
TF_LOG=debug AWS_PROFILE=terraform terraform apply derek.pln 
# wait
TF_LOG=debug AWS_PROFILE=terraform terraform plan -var-file=development.tfvars -out derek.pln 
TF_LOG=debug AWS_PROFILE=terraform terraform apply derek.pln 
#wait
TF_LOG=debug AWS_PROFILE=terraform terraform plan -var-file=development.tfvars -out derek.pln 
TF_LOG=debug AWS_PROFILE=terraform terraform apply derek.pln

Debug Output

2023-08-30T09:40:51.213-0400 [WARN]  Provider "provider[\"registry.terraform.io/hashicorp/aws\"]" produced an unexpected new value for aws_kendra_data_source.aem_pub_datasource, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .updated_at: was cty.StringVal("2023-08-30T13:20:17Z"), but now cty.StringVal("2023-08-30T13:40:50Z")
      - .configuration[0].web_crawler_configuration[0].max_content_size_per_page_in_mega_bytes: was cty.NumberIntVal(25), but now cty.NumberIntVal(0)

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 10 months ago

Community Note

Voting for Prioritization

Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
Please see our prioritization guide for information on how we prioritize.
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

If you are interested in working on this issue, please leave a comment.
If this would be your first contribution, please review the contribution guide.

byuniqueman commented 10 months ago

I am having the same exact problem as you as I cannot set max_content_size_per_page_in_mega_bytes.

Also, working with AWS support, we found out the settings tab is showing blank because the console is looking for some values, which are indeed not optional per the docs. AWS said that will be updating these docs. To confirm what does work, you can create a json config file, and create a data source from the AWS CLI.

skeleton.json - https://gist.github.com/byuniqueman/2c007a4880d9ce19bbe5174253f905d4

aws kendra create-data-source --cli-input-json file://skeleton.json

At this point the data source should be viewable in the settings tab

So with the "MaxContentSizePerPageInMegaBytes" broken in Terraform, I am at a standstill in terms of using Terraform to deploy any Kendra data sources at this point, since it's seems to be a requirement currently for the UI.

bhuvanesh19 commented 9 months ago

Could this be because MaxContentSizePerPageInMegaBytes passed to expandWebCrawlerConfiguration is not a valid float and is taking the default float32 value (0.0). :thinking:

byuniqueman commented 9 months ago

I am having the same exact problem as you as I cannot set max_content_size_per_page_in_mega_bytes.

Also, working with AWS support, we found out the settings tab is showing blank because the console is looking for some values, which are indeed not optional per the docs. AWS said that will be updating these docs. To confirm what does work, you can create a json config file, and create a data source from the AWS CLI.

skeleton.json - https://gist.github.com/byuniqueman/2c007a4880d9ce19bbe5174253f905d4

aws kendra create-data-source --cli-input-json file://skeleton.json

At this point the data source should be viewable in the settings tab

So with the "MaxContentSizePerPageInMegaBytes" broken in Terraform, I am at a standstill in terms of using Terraform to deploy any Kendra data sources at this point, since it's seems to be a requirement currently for the UI.

I verified with AWS support that this isn't a issue in V2 of the Webcrawler. There is no plans to fix V1 since it's on a deprecation path, so no effort will be put into that to fix the issue. A deprecation notice for V1 will be released soon.

derekbelrose commented 9 months ago

Does the provider support the v2 webcrawler type? All I see is reference to the WEBCRAWLER type.

derekbelrose commented 8 months ago

@byuniqueman Does your verification have any implication on how I might be able to get this to work at this point in time? I am not aware of any way for me to create V2 TemplateConfiguration style data sources (referenced in #29922).

hashicorp / terraform-provider-aws

[Bug]: aws_kendra_data_source declared values being reset on apply #33237

Terraform Core Version

AWS Provider Version

Affected Resource(s)

Expected Behavior

Actual Behavior

Relevant Error/Panic Output Snippet

Terraform Configuration Files

Steps to Reproduce

Debug Output

Panic Output

Important Factoids

References

Would you like to implement a fix?

Community Note