schnerring / terraform-azurerm-avd

Terraform sample containing Azure Active Directory Domain Services (AADDS) and Azure Virtual Desktop (AVD) deployments
MIT License
9 stars 15 forks source link

AADDS missing Permissions #2

Closed Berndinox closed 1 year ago

Berndinox commented 1 year ago
Error: creating/updating Domain Service (Name: "aadds", Resource Group: "aadds-rg"): performing CreateOrUpdate: domainservices.DomainServicesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="BadRequest" Message="The user xxxx does not have administrative privileges to manage AAD Domain Services instance in tenant xxxx."
with azurerm_active_directory_domain_service.aadds
on aadds.tf line 108, in resource "azurerm_active_directory_domain_service" "aadds":
resource "azurerm_active_directory_domain_service" "aadds" {

I have Owner, the SP has Owner.

I do use Terraform Cloud, thats why i`ve added:

provider "azurerm" {
  features {}

  subscription_id   = var.subscription_id
  tenant_id         = var.tenant_id
  client_id         = var.client_id
  client_secret     = var.client_secret
}
provider "azuread" {
  tenant_id         = var.tenant_id
  client_id         = var.client_id
  client_secret     = var.client_secret
}

Any suggestions?

schnerring commented 1 year ago

Does this happen upon initial creation?

It seems like you're having a permissions issue:

schnerring commented 1 year ago

See AADDS prerequisites: https://learn.microsoft.com/en-us/azure/active-directory-domain-services/tutorial-create-instance#prerequisites

Berndinox commented 1 year ago

Thanks for your response!

I created a SP for the connection between TF Cloud and Azure.

I granted permissions for the azureread provider: https://registry.terraform.io/providers/hashicorp/azuread/latest/docs/guides/microsoft-graph#assigning-new-api-permissions-for-a-service-principal

The SP also has: Cloud Application Administrator

I will try assign "Global Admin" to the SP

EDIT: Seems like it runs through with Global Admin - will update the post with the findings. Thanks alot!

Berndinox commented 1 year ago

Hy @schnerring - It all runs through now, i just got an last issue when the VM is going to be joined to the AADS.

Error: Code="VMExtensionProvisioningError" Message="VM has reported a failure when processing extension 'aadds-join-vmext'. Error message: \"Exception(s) occured while joining Domain 'avd.berndklaus.at'\"\r\n\r\nMore information on troubleshooting is available at https://aka.ms/vmextensionwindowstroubleshoot "
with azurerm_virtual_machine_extension.avd_aadds_join[0]
on avd.tf line 153, in resource "azurerm_virtual_machine_extension" "avd_aadds_join":
resource "azurerm_virtual_machine_extension" "avd_aadds_join" {

I have specified: avd.berndklaus.at for the _domain_name_ variable. However, there is no Custom Domain created. The Limit of this Var are 15 Letters, so i cant put in my full .onmicrosoft.com Domain.

The Nameserver Config inside the AVD vNET seems to be finde, also the peering is established.

May you have another tipp for me? :)

EDIT: AADDS seems OK image

From the Session Host:

ping avd.berndklaus.at

Pinging avd.berndklaus.at [10.0.0.4] with 32 bytes of data:
Reply from 10.0.0.4: bytes=32 time<1ms TTL=128
Reply from 10.0.0.4: bytes=32 time=1ms TTL=128
Reply from 10.0.0.4: bytes=32 time=2ms TTL=128

After reseting the AADDS Account PW i was able to join the domain manually from within the Session Host. Seems like its not an Domain Issue but a PW one.

PS: Sorry for spamming your Issues Tab, but may it helps others to get it working...

schnerring commented 1 year ago

The Limit of this Var are 15 Letters, so i cant put in my full .onmicrosoft.com Domain.

I think this only applies to the domain prefix, e.g. for myprefix in myprefix.onmicrosoft.com.

Have you checked the VM extension logs like the error message suggests?

Sorry for spamming your Issues Tab, but may it helps others to get it working...

I remember having lots of issues like this when initially figuring this out, too. I probably should create a troubleshooting section in the README before closing up the issue. Let me know when you get to the bottom of this!

After reseting the AADDS Account PW i was able to join the domain manually from within the Session Host. Seems like its not an Domain Issue but a PW one.

The account used to domain-join the VM, was it synced from on-prem AD environment, hence a hybrid identity? If that's the case, see Enable password synchronization in Azure Active Directory Domain Services for hybrid environments:

To use Azure AD DS with accounts synchronized from an on-premises AD DS environment, you need to configure Azure AD Connect to synchronize those password hashes required for NTLM and Kerberos authentication. After Azure AD Connect is configured, an on-premises account creation or password change event also then synchronizes the legacy password hashes to Azure AD.

Berndinox commented 1 year ago

When using just the first part of my onmicrosoft.com Domainname, Terraform comes up with:

Error: domain_name must be a valid FQDN and the first element must be 15 or fewer characters
with azurerm_active_directory_domain_service.aadds
on aadds.tf line 113, in resource "azurerm_active_directory_domain_service" "aadds":
  domain_name = var.domain_name

I got the Tenant assigned from company, the Name is longer than 15 signs. 👎 I will try adding a custom domain and adding that one. Reference: https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/active_directory_domain_service

I do use an empty Tenant and an fresh subscription. No On-Prem Sync or anything is in use.

I will start another deployment and check the VM extension logs. Will update the findings.

Thanks for your help!

Berndinox commented 1 year ago
2023-03-25T11:59:19.5781546Z    [Error]:    Try join: domain='a1.berndklaus.at', ou='', user='aadds@marketplacelabdemo.onmicrosoft.com', option='NetSetupJoinDomain, NetSetupAcctCreate' (#3:User Specified), errCode='1326'.
2023-03-25T11:59:19.5887205Z    [Error]:    Setting error code to 53 while joining domain
2023-03-25T11:59:20.9499802Z    [Error]:    Try join: domain='a1.DOMAIN.at', ou='', user='aadds@marketplacelabdemo.onmicrosoft.com', option='NetSetupJoinDomain' (#1:User Specified without NetSetupAcctCreate), errCode='1326'.
2023-03-25T11:59:20.9499802Z    [Error]:    Setting error code to 53 while joining domain
2023-03-25T11:59:20.9656553Z    [Error]:    Computer failed to join domain 'a1.DOMAIN.at' from workgroup 'WORKGROUP'.
2023-03-25T11:59:20.9656553Z    [Info]: Retrying action after 15 seconds, at attempt 1 out of '10'.

hmmm...

schnerring commented 1 year ago

Error code 1326 indicates a credential-related issue. Did you follow the recommended troubleshooting steps?

Can you please try to domain-join the VM manually via cmd to rule out credential-related issues??

op7ic commented 1 year ago

Hi, just FYI I've been playing with this code. I found out that user joining the domain need to have his password reset before it can be used for the operation. I've used your setup to create a sample lab and attempted to use DC admin to get the VM to join. This operation fails with error 1326 or 1909. Once I reset DC admin password and re-run the terraform file it all works well. @Berndinox see if this fixes it for you. I opened ticket related to this to see if that is something that can be fixed via terrafrom. If you want to manually test this, go to VDPOOL and add a VM in there using credentials before and after password reset. In my case, before password reset all the machines I add to VDPOOL won't be able to join the domain but after password reset it suddently works.

schnerring commented 1 year ago

I'm curious , could you please link the issue you've opened? @op7ic

I'm unable to reproduce the error on my side. To me, it sounds like a password hash sync issue unrelated to Terraform.

Berndinox commented 1 year ago

After resetting the PW, i was able to join the VM. I have used an fresh Tenant aswell... hmm

op7ic commented 1 year ago

@schnerring - https://github.com/hashicorp/terraform-provider-azuread/issues/1055 and initial bug is here - https://github.com/hashicorp/terraform-provider-azurerm/issues/21147

@schnerring its interesting that you can join the AD. Are you removing the user before re-running the AD creation process? I've spend some time with Azure team and they linked to the article I added in the ticket. Looks that Azure-only AD users come without NTLM hash needed to complete domain join operation.

schnerring commented 1 year ago

The account isn't synchronized from Azure AD to Azure AD DS until the password is changed. Either expire the passwords for all cloud users in the tenant who need to use Azure AD DS, which forces a password change on next sign-in, or instruct cloud users to manually change their passwords. For this tutorial, let's manually change a user password.

I'm pretty sure that a password change is only required for cloud-only accounts created before the AADDS infrastructure was deployed. Any account created after synchronizes properly. Which is why you're spot on with your question (thanks for asking it):

Are you removing the user before re-running the AD creation process?

I'm really, really sorry that I forgot to mention this, but the last time I tested the deployment, I did in two steps (🤦🤦🤦🤦🤦)

# [STEP 1] Create AADDS
terraform plan -out out.tfplan -target azurerm_active_directory_domain_service.aadds
terraform apply out.tfplan

# [STEP 2] Create the rest
terraform plan -out out.tfplan
terraform apply out.tfplan

This explains why it has been working for me. Have a look on the terraform graph of [STEP 1]:

image

azuread_group.dc_admins and azuread_user.dc_admin are missing from that graph. That's because these TF resources don't implicitly depend on the AADDS resources. So if you create them all in one go, these resources are created in parallel. Because creating user accounts takes around 10 seconds and creating the AADDS resources takes almost an hour explains why you have been encountering the issue and I havent. With my two-step process I ensured that azuread_user.dc_admin is created after the AADDS resources have already been created.

I think the fix to make the all-in-one-go deployment possible is to make random_password.dc_admin dependent on the AADDS resources like this:

resource "random_password" "dc_admin" {
  length = 64

  depends_on = [
    azurerm_active_directory_domain_service.aadds
  ]
}

(I haven't tested this, yet)

op7ic commented 1 year ago

Great idea - I'll give it a test and see if making user creation depend on AADDS will fix it.

schnerring commented 1 year ago

Let us know how it goes!

op7ic commented 1 year ago

@schnerring just a quick note - reorganized user creation sorted out the problem using the following clause when admin was created:

depends_on = [ azurerm_active_directory_domain_service.aadds ]

schnerring commented 1 year ago

Thanks for the feedback. I'll go ahead and close the issue then.