okta / terraform-provider-okta

A Terraform provider to manage Okta resources, enabling infrastructure-as-code provisioning and management of users, groups, applications, and other Okta objects.
https://registry.terraform.io/providers/okta/okta
Mozilla Public License 2.0
253 stars 203 forks source link

okta_app_group_assignment causes Internal Server Error due to incorrect input #1373

Open fatbasstard opened 1 year ago

fatbasstard commented 1 year ago

Hi,

we've configuring multiple groups and apps using a terraform module (so identical code) and in 3 out of 40 workspaces a group assignment (consistently the same) keeps throwing an error:

Error: failed to create application group assignment: Put "https://xxx/api/v1/apps/XXX/groups/xxx": the API returned an error: Internal Server Error, x-okta-request-id=xxx, giving up after 6 attempt(s)

Created an Okta issue for that (since it's "internal server error" and it is not a structural error in all workspaces).

Support came with a finding that the resource is actually passing an incorrect amount of groups:

From our logs, I can tell that the error you are facing is due to the wrong number of groups passed in the call. 

This is the error which can be found in our logs "errorMessage=Incorrect result size: expected 1, actual 2" 

To get a better understanding on what's causing the issue, could you provide the code that you are using in terraform to perform this group push?

We need to check if terraform is processing the call the right way, as in our logs it looks like it is sending more than expected. 

Community Note

Terraform Version

Terraform: 1.3.4 Okta provider: 3.38.0

Affected Resource(s)

Terraform Configuration Files


locals {
  employee_site_rw_assignments = [
    data.okta_app.myapp.id,
  ]
}

resource "okta_group" "employee_site_usecase_rw" {
  name        = "employee-${var.opco}-${var.site_code}-usecase-rw"
  skip_users  = true
}

resource "okta_app_group_assignment" "employee_site_usecase_rw" {
  for_each = toset(local.employee_site_rw_assignments)

  app_id   = each.key
  group_id = okta_group.employee_site_usecase_rw.id
  priority = 1

  lifecycle {
    ignore_changes = [priority]
  }
}```

### Debug Output

<!---
Please provide a link to a GitHub Gist containing the complete debug output. Please do NOT paste the debug output in the issue; just paste a link to the Gist.

To obtain the debug output, see the [Terraform documentation on debugging](https://www.terraform.io/docs/internals/debugging.html).
--->
monde commented 1 year ago

Thanks for gathering the specific details to help fix this bug @fatbasstard . I've added to our backlog.

Okta internal reference: https://oktainc.atlassian.net/browse/OKTA-551641

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days

dkulchinsky commented 9 months ago

Hey @duytiennguyen-okta, @monde 👋🏼

I believe I'm hitting this issue with the okta_app_group_assignments resource as well, I have an active support ticket that's looking into it but happy to provide more details about our setup/code here if you think it will help.

duytiennguyen-okta commented 8 months ago

Hi @dkulchinsky, this error is not caused by terraform itself so I will need to talk with an internal team. Can you try to run okta_app_group_assignments using TF_LOG=debug terraform apply and get the credential REDACTED. Also how many okta_app_group_assignments are you running successfully and how many failed? You can send the log over through our ticket system if you prefer that way as well

dkulchinsky commented 8 months ago

Hi @duytiennguyen-okta, I can try and get these logs for you next week when I'm back from PTO next week, however I'm getting conflicting report from Okta support that say this is a terraform provider issue:

I discussed internally about this with the engineering team. The issue that you are currently facing is a problem with the terraform provider and the team is aware. The engineering team is planning to work on this in the near future, but unfortunately at the moment we don't have an ETA for solving this issue. I will work on gathering more details, but I do not expect to receive any more details by the end of this year as the team is already engaged in solving other problems related to Terraform.

Case #01927213, they in fact referred me to this github issue.

duytiennguyen-okta commented 8 months ago

Yes that was the initial diagnosis, however I am unable to find the issue in the provider as our integration test is working as expected and there is no sign of payload issue, that's why we need those log to find the root cause

dkulchinsky commented 8 months ago

Hi @duytiennguyen-okta 👋🏼

Just got back from PTO and was able to reproduce the issue, I have attached the apply debug log with the failure to the support ticket (#01927213) as you suggested (I prefer not to share this over github).

regarding your question:

how many okta_app_group_assignments are you running successfully and how many failed?

we have a total of 54 okta_app_group_assignments resources, I've seen the issue in about 30-40% of them.

dkulchinsky commented 8 months ago

Hey @duytiennguyen-okta 👋🏼 Happy new year!

Just wanted to check if there's any update on this?

duytiennguyen-okta commented 8 months ago

I am looking at it. The payload from terraform is fine. I am contacting the API team

dkulchinsky commented 1 month ago

Hey @duytiennguyen-okta 👋

we are atill struggling with this issuez I have a Support ticket open with Okta since November last year but it doesn't seem to be able to progress towards a resolution, is it possible for you to enhage with the support/Engineering to try and review this? they seem to suggest it's something in the API implementation, but we just use the provider resource here and don't have much control on how it invokes the Okta API.

duytiennguyen-okta commented 1 month ago

Hey @dkulchinsky, I don't see the tf script but I would wager that your script look something like this where you're using for_each to loop through group in okta_app_group_assignments? If that is correct then I have a solution for you

resource "okta_group" "app_groups" {
  for_each    = toset(var.teams.*.id)
  name        = var.teams[index(var.teams.*.id, each.value)].name
  description = var.teams[index(var.teams.*.id, each.value)].description
}

resource "okta_app_group_assignments" "test" {
  for_each = toset(var.teams.*.id)
  app_id   = okta_app_bookmark.test.id
  group_id = okta_group.app_groups["${each.value}"].id
}
dkulchinsky commented 1 month ago

@duytiennguyen-okta I don't use a for_each on the okta_app_group_assignments resource, because I want the changes to be applied in a single API request (to avoid rate limits).

I use a dynamic block for the group attribute, this is my code:

resource "okta_app_group_assignments" "this" {
  app_id = var.app_id

  dynamic "group" {
    for_each = toset(local.distinct_groups)

    content {
      id      = var.all_groups[group.value].id
      profile = jsonencode(var.profile)
    }
  }
}
duytiennguyen-okta commented 1 month ago

I think the issue is the same.

It is because the use for_each in okta_app_group_assignments instead of okta_app_group_assignment . This cause terraform to understand that the user want to create multiple instances of okta_app_group_assignments . But there can only one instance of okta_app_group_assignments which led to race condition that the second terraform apply run trying to modify the same okta_app_group_assignments

Let me know if it helps

dkulchinsky commented 1 month ago

Hey @duytiennguyen-okta, thanks for the reply.

we moved from okta_app_group_assignment with for_each to okta_app_group_assignments without for_each following a recommendation from Okta due to the former resulting in rate limits to the /apps endpoint during Terraform Plan (READ operations).

with okta_app_group_assignment, the provider will send a GET request for each group in each app (number of apps * number of groups in each app) resulting in a lot of READ operations to the /api/v1/apps endpoint resulting in rate limits, after moving to okta_app_group_assignments we reduced the amount of READ operation to be equal to the number of Apps and we no longer hit rate limits.

I've read the quote you mentioned in your last comment, and to be honest I'm not sure I understand what it tries to suggest?

it sounds like okta_app_group_assignments when doing an apply is sending multiple PUT requests to /api/v1/apps/:app_id/groups/:group_id (one for each group being added/modified?) and that results in a race condition in the backend? if so, this appears to be either an implementation issue in the provider or a resiliency/concurrency issue in the backend?

I'm also not sure what you are proposing we do? should we go back to okta_app_group_assignment with for_each? then we'd have the issue of rate limits during Plans and that's a non-starter for us I'm afraid.

dkulchinsky commented 3 weeks ago

Hey @duytiennguyen-okta 👋🏼

Just wanted to circle back to this, as I'm not sure how to proceed based on last comment.

Given our code for app assignments and the context in my last comment, how do you suggest we proceed?

exitcode0 commented 3 weeks ago

If your having trouble with okta API rate limits due to app reads You can try to get okta to increase your API rate limits via okta support Or you can split your terraform config into multiple terraform root modules /states files that can be deployed seperately

dkulchinsky commented 3 weeks ago

@exitcode0 thanks, we are already operating with increased rate limits and unlikely to be able to get them to increase it further (that's coming from Okta)

EDIT: in fact (as I mentioned above), the recommendation to switch to okta_app_group_assignments came from Okta since it reduces the amount of reads to the apps api endpoint to be equal to the number of apps we manage.

splitting this code into independent "workspaces" doesn't really scale, it would make things very difficult to operate and would negate the main point of why we use terraform to operate this domain

this issue should be solvable, either on the client (provider) side or backend (API) side, for example we do not encounter such issues when making the same changes in the Okta console (UI)

duytiennguyen-okta commented 3 weeks ago

@dkulchinsky sorry I missed the comment. What I meant in the last comment was to check if there is multiple instances of okta_app_group_assignments which should not happened as it will run race condition. I just support a customer who has that problem and thought you might have the same problem. In any case, from what I heard from the API team, they have identified the issue and currently working on a fix. You'll probably hear official communication soon.

dkulchinsky commented 3 weeks ago

Thanks for replying @duytiennguyen-okta

we do have multiple invocations of the okta_app_group_assignments resource, however each one is targeting a different App so I don't believe that is an issue, also we are mostly applying changes to a single app at a time so only one okta_app_group_assignments resource is being modified during a terraform apply when the issue occurs.

EDIT: we also tried setting terraform parallelism to 1, to ensure that if there are multiple apps being modified during an apply that only one is modified at a time, however this didn't make a difference.

In any case, from what I heard from the API team, they have identified the issue and currently working on a fix. You'll probably hear official communication soon.

that sounds promising, I'll keep an eye out for more details from support.