microsoft / AzureTRE

An accelerator to help organizations build Trusted Research Environments on Azure.
https://microsoft.github.io/AzureTRE
MIT License
184 stars 143 forks source link

Inconsistent state after failed operation: deploy "tre-service-guacamole" without "openid_client_id" #1825

Closed Ben-Goethuys closed 2 years ago

Ben-Goethuys commented 2 years ago

After trying to deploy template "tre-service-guacamole" without supplying the "openid_client_id" parameter, the operation failed, but it created an entry in the workspace services. It now seems impossible to remove this service using the API because the resources were never created succesfully.

Steps to reproduce

  1. setup base workspace
  2. add workspace service using the API for the "tre-service-guacamole" template, omitting the "openid_client_id" parameter. Json: { "templateName":"tre-service-guacamole", "properties": { "display_name":"Virtual Desktop", "description":"Create virtual desktops for running research workloads", "is_exposed_externally":true, "guac_disable_copy":true, "guac_disable_paste":true } }
  3. wait until operation fails
  4. request all workspace services in the API. A new entry is returned for the failed "tre-service-guacamole" service
  5. try to remove / updating the service using the API. This results in an error. (string ID replaced with __ID__ below) __ID__: Error context message = Error: 1 error occurred: \t* could not load installation __ID__: Installation does not exist

How can I remove this service and obtain a consistent state?

marrobi commented 2 years ago

Hi @Ben-Goethuys.

Thanks for raising this.

Couple of things (other than step 4 for whoever picks this up rather than yourself)

  1. Client ID and Client Secret should be specified as required here, then the API would reject the invalid request - https://github.com/microsoft/AzureTRE/blob/64656c9199678ef09613207dc626f0afa2af7e60/templates/workspace_services/guacamole/template_schema.json#L7

  2. In the near future the requirement to enter these will be removed as per https://github.com/microsoft/AzureTRE/issues/1474 cc @ross-p-smith

  3. The error you are getting is as Porter is trying to uninstall the deployment but the state does not exist as it never installed in the first place. We need to handle the fact that deletions may be required when deployments do not succeed. I will add an issue for this #1828

  4. To clean this up: a) Find the Workspace Service ID, using the workspace ID and get workspace services. b) In the Azure portal navigate to the cosmos database in the TRE core resource group. c) Under network, add your client IP d) Under Data Explorer, navigate to AzureTRE-> Resources -> Items, find the "offending" resource and delete the item.

Ben-Goethuys commented 2 years ago

Hi @marrobi

I have successfully removed the resource item in the cosmos db. Thank you!

marrobi commented 2 years ago

Thanks for letting us know. Will close the issue and we will get this resolved as per points above.