Closed mukundansampath closed 5 months ago
Also attaching the extensions present in the 2 arc machines....
@mukundansampath 1) Could you check which version of the OS is installed on the client VM?
One way to check is to run Start->Run->winver
2) Did you also retry the deployment after you verified that the extensions are installed? (reason for asking is that the LcmManager can take some time)
Hi @janegilring - Thanks for taking a look -
I tried multiple times after the extensions are installed.
@mukundansampath After inspecting the logs it seems to be an issue in this section:
#################################################################################################
# - Add required RBAC permission required for the service principal to deploy Azure Stack HCI
#################################################################################################
INFO: Loaded Module 'Az.Authorization'
INFO: Loaded Module 'Az.Accounts'
INFO: Loaded Module 'Az.MSGraph'
New-AzRoleAssignment : Operation returned an invalid status code 'Forbidden'
At C:\HCIBox\HCIBoxLogonScript.ps1:61 char:5
+ New-AzRoleAssignment -RoleDefinitionName "Key Vault Administrator ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : CloseError: (:) [New-AzRoleAssignment], ErrorResponseException
+ FullyQualifiedErrorId : Microsoft.Azure.Commands.Resources.NewAzureRoleAssignmentCommand
Account SubscriptionName TenantId Environment
------- ---------------- -------- -----------
60e34a37-09f5-4e80-be90-c3d7686cae19 hcs-mcw-azure-subscription b39138ca-3cee-4b4a-a4d6-cd83d9dd62f0 AzureCloud
New-AzRoleAssignment : Operation returned an invalid status code 'Forbidden'
At C:\HCIBox\HCIBoxLogonScript.ps1:61 char:5
+ New-AzRoleAssignment -RoleDefinitionName "Key Vault Administrator ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : CloseError: (:) [New-AzRoleAssignment], ErrorResponseException
+ FullyQualifiedErrorId : Microsoft.Azure.Commands.Resources.NewAzureRoleAssignmentCommand
New-AzRoleAssignment : Operation returned an invalid status code 'Forbidden'
At C:\HCIBox\HCIBoxLogonScript.ps1:66 char:5
+ New-AzRoleAssignment -RoleDefinitionName "Storage Account Contrib ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : CloseError: (:) [New-AzRoleAssignment], ErrorResponseException
+ FullyQualifiedErrorId : Microsoft.Azure.Commands.Resources.NewAzureRoleAssignmentCommand
New-AzRoleAssignment : Operation returned an invalid status code 'Forbidden'
At C:\HCIBox\HCIBoxLogonScript.ps1:66 char:5
+ New-AzRoleAssignment -RoleDefinitionName "Storage Account Contrib ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : CloseError: (:) [New-AzRoleAssignment], ErrorResponseException
+ FullyQualifiedErrorId : Microsoft.Azure.Commands.Resources.NewAzureRoleAssignmentCommand
New-AzRoleAssignment : Operation returned an invalid status code 'Forbidden'
At C:\HCIBox\Generate-ARM-Template.ps1:14 char:1
+ New-AzRoleAssignment -ObjectId $env:spnProviderId -RoleDefinitionName ...
+ CategoryInfo : CloseError: (:) [New-AzRoleAssignment], ErrorResponseException
+ FullyQualifiedErrorId : Microsoft.Azure.Commands.Resources.NewAzureRoleAssignmentCommand
New-AzRoleAssignment : Operation returned an invalid status code 'Forbidden'
At C:\HCIBox\Generate-ARM-Template.ps1:14 char:1
+ New-AzRoleAssignment -ObjectId $env:spnProviderId -RoleDefinitionName ...
This might indicate that the Service Principal was assigned the Contributor RBAC role rather than Owner, hence it does not have permissions to assign roles. Given that assigning permissions failed, it is likely the root cause for the ARM validation deployment fails, due to missing permissions for the Resource Provider to read the status of the node extensions.
In order to resolve the issue without a redeployment of HCIBox, you may have a look at the command which assigns the permissions here and here.
And assign them either by running the commands manually or assigning the permissions manually via the portal.
Thanks for taking a look @janegilring The validation is still failing with the exact same error even after giving owner access to the service principal and retrying the deployment from scratch. Logs_21Apr2024.zip We are not hitting the permission issue(Logs attached)
In our organization there is one more person who setup the azure stack HCI simulator with a different subscription. I see there are 2 managed identities names for example when I search for the access using IAM for my resource group - one is his and one is mine.(see the date created - Mine is today)
Could it be causing this conflict?
Thanks,
@mukundansampath Thanks for the update, the new deployment logs looks good.
Could it be causing this conflict?
I do not think that should be an issue, as we are also deploying multiple instances of HCIBox in the same tenant for development purposes without problems.
One thing to check/verify: Could you go to the resource group where HCIBox are deployed and check whether the Microsoft.AzureStackHCI Resource Provider is listed with the RBAC role Azure Connected Machine Resource Manager?
@janegilring No. That role assignment was indeed missing
But the SP has that role -
Added this role assignment manually from the portal. That fixed the issue. Deployment validation has succeeded. Proceeding with the deployment. Why was the role assignment missing?
@mukundansampath Glad the deployment validation succeeded. Looking at the deployment logs you shared, the role was assigned multiple times:
RoleAssignmentName : edbf51ac-e1eb-4eb9-921a-1b39b53eb688
RoleAssignmentId : /subscriptions/0456a995-2102-4130-82c9-6c9548ec5105/resourceGroups/msam-stackh
ci-rg/providers/Microsoft.Authorization/roleAssignments/edbf51ac-e1eb-4eb9-921
a-1b39b53eb688
Scope : /subscriptions/0456a995-2102-4130-82c9-6c9548ec5105/resourceGroups/msam-stackh
ci-rg
DisplayName :
SignInName :
RoleDefinitionName : Azure Connected Machine Resource Manager
RoleDefinitionId : f5819b54-e033-4d82-ac66-4fec3cbf3f4c
ObjectId : 05c316fb-a3fb-41e0-afce-3c7df0f00959
ObjectType : Unknown
CanDelegate : False
Description :
ConditionVersion :
Condition :
RoleAssignmentName : 5a007dc9-0319-4566-92ab-124d813d093d
RoleAssignmentId : /subscriptions/0456a995-2102-4130-82c9-6c9548ec5105/resourceGroups/msam-stackh
ci-rg/providers/Microsoft.Authorization/roleAssignments/5a007dc9-0319-4566-92a
b-124d813d093d
Scope : /subscriptions/0456a995-2102-4130-82c9-6c9548ec5105/resourceGroups/msam-stackh
ci-rg
DisplayName :
SignInName :
RoleDefinitionName : Azure Connected Machine Resource Manager
RoleDefinitionId : f5819b54-e033-4d82-ac66-4fec3cbf3f4c
ObjectId : e8a84f30-03ea-4cd0-b49f-67c9f6ae8d3e
ObjectType : Unknown
CanDelegate : False
Description :
ConditionVersion :
Condition :
RoleAssignmentName : 25dfe2b8-d442-40ef-9621-686b8061078b
RoleAssignmentId : /subscriptions/0456a995-2102-4130-82c9-6c9548ec5105/resourceGroups/msam-stackh
ci-rg/providers/Microsoft.Authorization/roleAssignments/25dfe2b8-d442-40ef-962
1-686b8061078b
Scope : /subscriptions/0456a995-2102-4130-82c9-6c9548ec5105/resourceGroups/msam-stackh
ci-rg
DisplayName : vmw-hcs-principal-msampathkumar
SignInName :
RoleDefinitionName : Azure Connected Machine Resource Manager
RoleDefinitionId : f5819b54-e033-4d82-ac66-4fec3cbf3f4c
ObjectId : d9397bc1-4321-45b8-afd5-a83130df497a
ObjectType : ServicePrincipal
CanDelegate : False
Description :
ConditionVersion :
Condition :
If you run az ad sp list --display-name "Microsoft.AzureStackHCI Resource Provider"
, is the id of the outputted object the same as one of the above assignments?
@janegilring Here is the output. Dont see it -
msampathkumar@msampathkuJC2D1 createImage % az ad sp list --display-name "Microsoft.AzureStackHCI Resource Provider" [ { "accountEnabled": true, "addIns": [], "alternativeNames": [], "appDescription": null, "appDisplayName": "Microsoft.AzureStackHCI Resource Provider", "appId": "1412d89f-b8a8-4111-b4fd-e82905cbd85d", "appOwnerOrganizationId": "f8cdef31-a31e-4b4a-93e4-5f571e91255a", "appRoleAssignmentRequired": false, "appRoles": [], "applicationTemplateId": null, "createdDateTime": "2021-09-13T05:16:57Z", "deletedDateTime": null, "description": null, "disabledByMicrosoftStatus": null, "displayName": "Microsoft.AzureStackHCI Resource Provider", "homepage": null, "id": "7f47539e-70c8-4ff7-8e78-ac6a386a946b", "info": { "logoUrl": null, "marketingUrl": null, "privacyStatementUrl": null, "supportUrl": null, "termsOfServiceUrl": null }, "keyCredentials": [], "loginUrl": null, "logoutUrl": null, "notes": null, "notificationEmailAddresses": [], "oauth2PermissionScopes": [], "passwordCredentials": [], "preferredSingleSignOnMode": null, "preferredTokenSigningKeyThumbprint": null, "replyUrls": [], "resourceSpecificApplicationPermissions": [], "samlSingleSignOnSettings": null, "servicePrincipalNames": [ "1412d89f-b8a8-4111-b4fd-e82905cbd85d", "https://sea-azurestackhci-rp.azurewebsites.net" ], "servicePrincipalType": "Application", "signInAudience": "AzureADMultipleOrgs", "tags": [], "tokenEncryptionKeyId": null, "verifiedPublisher": { "addedDateTime": null, "displayName": null, "verifiedPublisherId": null } } ]
@mukundansampath Thanks, could you check your parameters-file and see whether it contains the value 7f47539e-70c8-4ff7-8e78-ac6a386a946b
for the spnProviderId
parameter?
@janegilring No dont see it either -
"spnProviderId": { "value": "d9397bc1-4321-45b8-afd5-a83130df497a" },
@mukundansampath Then I think we have found the culprit.
If I understand correctly, you provided the value for the SPN used for the deployment for the parameter spnProviderId
. The correct value for this should be the id 7f47539e-70c8-4ff7-8e78-ac6a386a946b
.
The guidance for populating this parameter is available here:
My bad. Phew. This took me around circles. Closing the bug. Thanks for the patient help janegilring
Closing - Bad input for the spnProviderId
One suggestion though @janegilring. Can we rename the param from spnProviderId to something else like stackHciProviderId?
@mukundansampath Thanks for the suggestion. We will give it some thought for future iterations, but I suspect it might be considered a breaking change.
Is your issue related to a Jumpstart scenario, ArcBox, HCIBox, or Agora? HCIBox
Describe the issue or the bug Following https://azurearcjumpstart.com/azure_jumpstart_hcibox/cloud_deployment after the azure arc machines(AzSHOST1, AzSHOST2) are created from the power shell script successfully I tried deploying the ARM template and validate the deployment. It fails consistently(screenshot attached)
To Reproduce
Exception encountered while adding node to cluster [Resource validation failed. Details: [{"Code":"ValidationFailed","Message":"Arc extensions installed on Arc Machine /subscriptions/0456a995-2102-4130-82c9-6c9548ec5105/resourceGroups/msam-stackhci-rg/providers/Microsoft.HybridCompute/machines/AzSHOST1 are while required list of mandatory arc extensions are TelemetryAndDiagnostics, DeviceManagementExtension, LcmController","Target":null,"Details":null},{"Code":"ValidationFailed","Message":"Arc extensions installed on Arc Machine /subscriptions/0456a995-2102-4130-82c9-6c9548ec5105/resourceGroups/msam-stackhci-rg/providers/Microsoft.HybridCompute/machines/AzSHOST2 are while required list of mandatory arc extensions are TelemetryAndDiagnostics, DeviceManagementExtension, LcmController","Target":null,"Details":null},{"Code":"ValidationFailed","Message":"Arc machines validation failed for /subscriptions/0456a995-2102-4130-82c9-6c9548ec5105/resourceGroups/msam-stackhci-rg/providers/Microsoft.HybridCompute/machines/AzSHOST1, /subscriptions/0456a995-2102-4130-82c9-6c9548ec5105/resourceGroups/msam-stackhci-rg/providers/Microsoft.HybridCompute/machines/AzSHOST2","Target":null,"Details":null}].] at [ at Microsoft.AzureStackHCI.ResourceProvider.Services.EdgeDeviceManager.ValidateNodesAsync(ResourceCollection
1 machines, IList
1 arcMachineIDs, String parentClusterResourceId) in C:__w\1\s\src\rp\Services\EdgeDeviceManager.cs:line 404 at Microsoft.AzureStackHCI.ResourceProvider.Services.ClusterNodeManager.AddARCNodesToCluster(ClusterDeploymentWorkItem workItem) in C:__w\1\s\src\rp\Services\ClusterNodeManager.cs:line 114] (Code: NotSpecified)Validate that the extensions are present in the both Azure Arc machines.
Expected behavior It should not fail. I tried twice and I am hitting the same issue
Environment summary bicep % az --version azure-cli 2.59.0
Have you looked at the Troubleshooting and Logs section? yes
Screenshots
Also uploading the HCI client box logs - Logs.zip
Additional context