stgeorgi / wvdquickstart

26 stars 31 forks source link

DevOps pipeline fails in Deploy_WVDSessionHosts > Deploy module [VirtualMachines] | Domain join error #15

Closed sayanghosh123 closed 4 years ago

sayanghosh123 commented 4 years ago

I was trying to run the new deployment on empty subscription, i.e. NewSubAADDSSetup/deploy.json. Location EastUS.

The deployment in Azure succeeded. Then I went to the DevOps pipeline and saw that it failed at "Deploy_WVDSessionHosts > Deploy module [VirtualMachines]"

Raw error

2020-10-07T21:37:49.9981425Z New-AzResourceGroupDeployment: /home/vsts/work/1/s/SharedDeploymentFunctions/Invoke-GeneralDeployment.ps1:76 2020-10-07T21:37:49.9983090Z Line | 2020-10-07T21:37:49.9984258Z  76 |  … eployment = New-AzResourceGroupDeployment @DeploymentInputs -Resource … 2020-10-07T21:37:49.9985689Z  |  ~~~~~~~~~~~~~ 2020-10-07T21:37:49.9986798Z  | 21:37:49 - Resource 2020-10-07T21:37:49.9987840Z  | Microsoft.Compute/virtualMachines/extensions 2020-10-07T21:37:49.9989013Z  | 'QS-WVD-VM001/DomainJoin' failed with message '{ "status": 2020-10-07T21:37:49.9990133Z  | "Failed", "error": { "code": 2020-10-07T21:37:49.9991308Z  | "ResourceDeploymentFailure", "message": "The resource 2020-10-07T21:37:49.9992460Z  | operation completed with terminal provisioning state 2020-10-07T21:37:49.9993624Z  | 'Failed'.", "details": [ { "code": 2020-10-07T21:37:49.9994853Z  | "VMExtensionProvisioningError", "message": "VM has 2020-10-07T21:37:49.9996269Z  | reported a failure when processing extension 'DomainJoin'. 2020-10-07T21:37:49.9998122Z  | Error message: \"Exception(s) occured while joining Domain 2020-10-07T21:37:49.9999498Z  | 'e20wvddemoUS.onmicrosoft.com'\"\r\n\r\nMore information on 2020-10-07T21:37:50.0000552Z  | troubleshooting is available at 2020-10-07T21:37:50.0001654Z  | https://aka.ms/vmextensionwindowstroubleshoot " } ] 2020-10-07T21:37:50.0002626Z  | } }' 2020-10-07T21:37:50.0003362Z  2020-10-07T21:37:50.0190913Z ##[error]PowerShell exited with code '1'.

What I have also found -

inputValidationRunbook

New-AzureADUser : Error occurred while executing NewUser Code: Request_BadRequest Message: The domain portion of the userPrincipalName property is invalid. You must use one of the verified domain names in your organization. RequestId: 80efa1e5-afe1-4879-9d6b-3342d5ca8ecd DateTimeStamp: Wed, 07 Oct 2020 19:56:35 GMT Details: PropertyName - userPrincipalName, PropertyErrorCode - InvalidValue HttpStatusCode: BadRequest HttpStatusDescription: Bad Request HttpResponseStatus: Completed At line:136 char:1 + New-AzureADUser -DisplayName $username -PasswordProfile $PasswordProf ... + ~~~~~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (:) [New-AzureADUser], ApiException + FullyQualifiedErrorId : Microsoft.Open.AzureAD16.Client.ApiException,Microsoft.Open.AzureAD16.PowerShell.NewUser

I have validated the input and the "Azure Admin Upn" was provided in the user@domain.onmicrosoft.com. The deployment is trying to use ADDS only (it's a POC).

Four subsequent errors which seem related. In Azure AD, there are no new users added.

devOpsSetupRunbook

New-AzureADUser : Error occurred while executing NewUser Code: Request_BadRequest Message: The domain portion of the userPrincipalName property is invalid. You must use one of the verified domain names in your organization. RequestId: 009ffdab-d6a1-42ab-8762-a778b811b15a DateTimeStamp: Wed, 07 Oct 2020 21:19:27 GMT Details: PropertyName - userPrincipalName, PropertyErrorCode - InvalidValue HttpStatusCode: BadRequest HttpStatusDescription: Bad Request HttpResponseStatus: Completed At line:101 char:1 + New-AzureADUser -DisplayName $adminUsername -PasswordProfile $Passwor ... + ~~~~~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (:) [New-AzureADUser], ApiException + FullyQualifiedErrorId : Microsoft.Open.AzureAD16.Client.ApiException,Microsoft.Open.AzureAD16.PowerShell.NewUser

New-AzADUser : The domain portion of the userPrincipalName property is invalid. You must use one of the verified domain names in your organization. At line:260 char:33 + ... eateUser) { New-AzADUser -UserPrincipalName $upn -DisplayName "$userN ... + ~~~~~~~~~~~~~ + CategoryInfo : InvalidOperation: (:) [New-AzADUser], Exception + FullyQualifiedErrorId : Microsoft.Azure.Commands.ActiveDirectory.NewAzureADUserCommand

Total 8 errors but these seemed significant.

Any clues where to look next or what else we can try? I have tried several times on this subscription before but cleaned up after each times as per the recommended steps. Those attempts did not fully succeed but I was trying to deploy my forked repo. This was based on this (primary) repo. Not sure if that has any impact.

VinceThompson commented 4 years ago

I had issues with this also. I made sure i was in the correct Azure AD group and i also created a Server in Azure and joined the domain manually, once that worked i re-ran the script again and it got past that stage.

It now errors out on the customextensions for me and cannot resolve it sadly. Shame as this is a great pipeline i just wish i could get it going end to end.

sayanghosh123 commented 4 years ago

Thanks @VinceThompson - I just tried that and nothing seemed to work. I am not great with Windows server administration though so probably didn't do it right.

I created a new Win server VM on the same VNET as aadds and tried to domain join that. Every time I try to enter the credential to join the domain (this is the same AAD admin account we used to create the deployment? I used the UPN format as per https://docs.microsoft.com/en-us/azure/active-directory-domain-services/join-windows-vm) - I run into "The referenced account is currently locked out and may not be logged on to." Tried enabling https://docs.microsoft.com/en-us/azure/active-directory-domain-services/security-audit-events using a Log Analytics workspace but the query doesn't give me zilch

Any clues? Would appreciate if I could do this and at least reach the state where you got stuck.

VinceThompson commented 4 years ago

In Azure AAD the account you are using what groups does it belong to?

sayanghosh123 commented 4 years ago

Hi @VinceThompson - it is not a member of any groups in AAD. It has "Global Admin" role in AAD, and Owner RBAC role against the subscription.

sayanghosh123 commented 4 years ago

I should also point out that I did crack the domain join nut. It's a bit tricky but the following video helped -

https://www.youtube.com/watch?v=OQjK4gC89Xc

Couple concerns -

Coming back to the original issue, I ran the pipeline again after this and it failed at exactly the same stage as reported so the original issue persists.

VinceThompson commented 4 years ago

Try removing the Extension from the VM(s), it should say failed or something. Then try running the pipeline again.

sayanghosh123 commented 4 years ago

Thanks for all your help mate @VinceThompson.

Also - another observation. Based on Vince's earlier comments, there was a whole bunch of "AAD DC Admins" groups created (I have retried many times and this wasn't one of the cleanup recommendations). I added the account to each one of them, and after that I could RDP to my manually created and domain joined VM! After this, I reran the pipeline, and it did clear "Deploy_WVDSessionHosts" step. Then, it failed in the "Deploy_WVDSessionHosts" step with the same error. I wonder if this is a bug with the automation as well where it may fail to add the initial admin ID as a member of "AAD DC Admins" group.

I did remove the domain join extension to the 2 VMs created from Azure portal, and reran the pipeline. It again got up to the "Deploy_WVDSessionHosts" step and failed with the same error, and I see the extension back with a status of "Provisioning failed".

stgeorgi commented 4 years ago

@sayanghosh123 it will happen automatically you do not need to press the configure button. I think it failed in your case due an earlier issue and we never got to that code. Than you on the cleanup recomednation

sayanghosh123 commented 3 years ago

If anyone ever stumbled into this issue, I finally found the issue. I was using a different domain name for AD DS than the AAD (I have no idea whether that is a supported scenario, but it came from my OCD to keep things distinguished). After looking through the docs, I realised that existingDomainName is inferred from DomainJoinAccountUPN, so using a different AD DS name means domain join will definitely fail. I went back and used the same domain name and voila, it started to work like magic. Hope this helps someone else running into the same issue.

stgeorgi commented 3 years ago

@sayanghosh123 we will consider adding overwritable domain field in the advanced version.

VinceThompson commented 3 years ago

Would love to see a Terraform version of this WVDquickstart 👍