netskopeoss / terraform-provider-netskope

Apache License 2.0
6 stars 5 forks source link

"Unable to call backend API" response leading to orphaned broken NPAs requiring manual deletion to resolve #6

Closed Taoquitok closed 1 year ago

Taoquitok commented 1 year ago

On rare occasions when applying new NPAs (Roughly 1% of the time) via terraform we've received "Unable to call backend API" errors which result in an orphaned semi-existing NPA that's visible, but not interactable, within the tenant portal and which can't be returned via the API

image

When this occurs, the app is visible within the portal, but the gui throws "We encountered a backend error. Please try again". Clicking into the app shows it has no ports or publishers assigned.
Attempting to delete the app via the portal results in the same error message too.

If we instead go through some steps to infer the app_id for the broken PAs by listing all PAs via the api get_api_v2_steering_apps_private and looking for missing values at the bottom of the list we can then run the delete api /steering/delete_api_v2_steering_apps_private__private_app_id_ against those ids to resolve the issue. For reference the delete at this point still throws an error (see below), but the broken app gets deleted too allowing a follow up terraform apply to be run to finish creating the applications

image

Please do say if you need any tenant and/or log times to compare to backend logs for this issue if it's not a known one and I can reach out via the normal support channels to link to this issue

Taoquitok commented 1 year ago

A quick update on this. Prior to this week, the API error was rare to get. For the week starting 23rd of Jan 2023 I've had this error 5 times across roughly 30 new PAs. Pretty much every time I go to create more than 1 PA at a time it'll occur for at least one of the PAs generated

Taoquitok commented 1 year ago

Additional comment. If the error occurs while updating an existing app. terraform updates the state but the change does not happen. This behaviour is confusing and negates the purpose of terraform state.
Follow up plans do not detect the mismatch in state either, so there isn't any validation that state matches the real world to allow for corrections to be applied

The below errors occurred when applying new tags to ~ 50 existing private apps. 10% error rate in this case

│ Error: Unable to call backend API
│
│   with module.netskope_dynamic_privateapps["PA-obfuscated1"].netskope_privateapps.privateapp,
│   on privateapp-module\main.tf line 24, in resource "netskope_privateapps" "privateapp":
│   24: resource "netskope_privateapps" "privateapp" {
│
╵
╷
│ Error: Unable to call backend API
│
│   with module.netskope_dynamic_privateapps["PA-obfuscated2"].netskope_privateapps.privateapp,
│   on privateapp-module\main.tf line 24, in resource "netskope_privateapps" "privateapp":
│   24: resource "netskope_privateapps" "privateapp" {
│
╵
╷
│ Error: Url:http://ns-15146.de-fr4.npa.goskope.com:80/orca/services/875, Error returned by backend API, status code:500
│
│   with module.netskope_ip_privateapps["PA-obfuscated3"].netskope_privateapps.privateapp,
│   on privateapp-module\main.tf line 24, in resource "netskope_privateapps" "privateapp":
│   24: resource "netskope_privateapps" "privateapp" {
│
╵
╷
│ Error: Unable to call backend API
│
│   with module.netskope_dynamic_privateapps["PA-obfuscated4"].netskope_privateapps.privateapp,
│   on privateapp-module\main.tf line 24, in resource "netskope_privateapps" "privateapp":
│   24: resource "netskope_privateapps" "privateapp" {
│
╵
╷
│ Error: Unable to call backend API
│
│   with module.netskope_dynamic_privateapps["PA-obfuscated5"].netskope_privateapps.privateapp,
│   on privateapp-module\main.tf line 24, in resource "netskope_privateapps" "privateapp":
│   24: resource "netskope_privateapps" "privateapp" {
│
iain-madder-frontiers commented 1 year ago

Update on this. Discussing with netskope support, seems the error is occurring only when tags are defined. Removing the tag definitely stops it from occurring.
Awaiting resolution of internal ticket and then I'll update this issue

Taoquitok commented 1 year ago

Issue is unrelated to terraform, so I'll close this ticket with advice that adding -parallelism=1 to the terraform apply stops the issue from occurring.
Netskope engineering are continuing to work on the api to fix the issues with the applying tags breaking new PAs when too many requests are made