microsoft / azure_arc

Automated Azure Arc, Edge, and Platform environments
https://aka.ms/ArcJumpstart
Creative Commons Attribution 4.0 International
736 stars 545 forks source link

Azure Stack HCI Cluster Deployment - HCI Network Connectivity Validation Failed - HCI Jumpstart #2714

Open pmadusud opened 6 days ago

pmadusud commented 6 days ago

I am trying to deploy the Azure Stack HCI Cluster (post the validation phase) using HCI Jumpstart and during that step the deployment fails due to the HCI Network Connectivity validation error out.

The error is mentioned below,

Type 'ValidateNetwork' of Role 'EnvironmentValidator' raised an exception: { "ExceptionType": "json", "ErrorMessage": { "Message": "Network requirements not met. Review output and remediate.", "Results": [ { "Name": "AzStackHci_Network_Test_NetAdapter_RDMA_Operational", "DisplayName": "Test if RDMA requirement meets for the deployment on all servers", "Tags": { }, "Title": "Test NetAdapter RDMA requirement", "Status": 1, "Severity": 2, "Description": "Checking RDMA Operational Status on 192.168.1.13", "Remediation": "Make sure adapter RDMA is operational. Use Get-NetAdapterRdma cmdlet to check the status of RDMA for the network adapter in the system.", "TargetResourceID": "192.168.1.13", "TargetResourceName": "NetAdapter", "TargetResourceType": "Network Adapter RDMA", "Timestamp": "\/Date(1726246395557)\/", "AdditionalData": { "Detail": "\nERROR: RDMA setting on adapters are invalid on AZSHOST2\n Intent Compute_Management Adapter Override - [ False ]; NetworkDirect - [ 1 ]\n Wrong configuration for adapters FABRIC\r\n: RDMA not supported, but not configured with intent adapter override to disable NetworkDirect\n Intent Storage Adapter Override - [ False ]; NetworkDirect - [ 1 ]\n Wrong configuration for adapter StorageB: RDMA Enabled - [ False ]; RDMA OperationalState - [ False ]\n Wrong configuration for adapter StorageA: RDMA Enabled - [ False ]; RDMA OperationalState - [ False ]", "Status": "FAILURE", "TimeStamp": "09/13/2024 16:53:15", "Resource": "Network Adapter RDMA Operational Status", "Source": "192.168.1.13" }, "HealthCheckSource": "Deployment\\Network\\2eca8b41" }, { "Name": "AzStackHci_Network_Test_NetAdapter_RDMA_Operational", "DisplayName": "Test if RDMA requirement meets for the deployment on all servers", "Tags": { }, "Title": "Test NetAdapter RDMA requirement", "Status": 1, "Severity": 2, "Description": "Checking RDMA Operational Status on 192.168.1.12", "Remediation": "Make sure adapter RDMA is operational. Use Get-NetAdapterRdma cmdlet to check the status of RDMA for the network adapter in the system.", "TargetResourceID": "192.168.1.12", "TargetResourceName": "NetAdapter", "TargetResourceType": "Network Adapter RDMA", "Timestamp": "\/Date(1726246396702)\/", "AdditionalData": { "Detail": "\nERROR: RDMA setting on adapters are invalid on AZSHOST1\n Intent Compute_Management Adapter Override - [ False ]; NetworkDirect - [ 1 ]\n Wrong configuration for adapters FABRIC\r\n: RDMA not supported, but not configured with intent adapter override to disable NetworkDirect\n Intent Storage Adapter Override - [ False ]; NetworkDirect - [ 1 ]\n Wrong configuration for adapter StorageA: RDMA Enabled - [ False ]; RDMA OperationalState - [ False ]\n Wrong configuration for adapter StorageB: RDMA Enabled - [ False ]; RDMA OperationalState - [ False ]", "Status": "FAILURE", "TimeStamp": "09/13/2024 16:53:16", "Resource": "Network Adapter RDMA Operational Status", "Source": "192.168.1.12" }, "HealthCheckSource": "Deployment\\Network\\2eca8b41" } ] }, "ExceptionStackTrace": "at ParseResult, C:\\NugetStore\\AzStackHci.EnvironmentChecker.Deploy.1.2100.2784.504\\content\\Classes\\EnvironmentValidator\\EnvironmentValidator.psm1: line 1145 at Test-AzStackHciNetwork, C:\\Program Files\\WindowsPowerShell\\Modules\\AzStackHci.EnvironmentChecker\\AzStackHciNetwork\\AzStackHciNetwork.psm1: line 159 at \u003cScriptBlock\u003e, \u003cNo file\u003e: line 1 at RunSingleValidator, C:\\NugetStore\\AzStackHci.EnvironmentChecker.Deploy.1.2100.2784.504\\content\\Classes\\EnvironmentValidator\\EnvironmentValidator.psm1: line 671 at ValidateNetwork, C:\\NugetStore\\AzStackHci.EnvironmentChecker.Deploy.1.2100.2784.504\\content\\Classes\\EnvironmentValidator\\EnvironmentValidator.psm1: line 376 at \u003cScriptBlock\u003e, C:\\CloudDeployment\\ECEngine\\InvokeInterfaceInternal.psm1: line 139 at Invoke-EceInterfaceInternal, C:\\CloudDeployment\\ECEngine\\InvokeInterfaceInternal.psm1: line 134" } at RunSingleValidator, C:\NugetStore\AzStackHci.EnvironmentChecker.Deploy.1.2100.2784.504\content\Classes\EnvironmentValidator\EnvironmentValidator.psm1: line 687 at ValidateNetwork, C:\NugetStore\AzStackHci.EnvironmentChecker.Deploy.1.2100.2784.504\content\Classes\EnvironmentValidator\EnvironmentValidator.psm1: line 376 at <ScriptBlock>, C:\CloudDeployment\ECEngine\InvokeInterfaceInternal.psm1: line 139 at Invoke-EceInterfaceInternal, C:\CloudDeployment\ECEngine\InvokeInterfaceInternal.psm1: line 134

Pls guide me on what am I missing and how I can correct the issue.

janegilring commented 2 days ago

@pmadusud Hi, which Azure region did you deploy to? Also, could you share the logs from C:\HCIBox\Logs on the HCIBox-Client VM?

pmadusud commented 1 day ago

Hi @janegilring,

I am using Australia East as the region.

I have attached the logs for your review.

Pls help on how I can resolve the issue mentioned. Thank you. Logs.zip

janegilring commented 1 day ago

@pmadusud Thank you.

I noticed the following section in the HCIBoxLogonScript.log file:

###########################
# - Configuring storage
###########################

New-StoragePool : One or more physical disks are not supported by this operation.

Extended information:
One or more physical disks encountered an error while creating the storage pool.

Physical Disks:

{c13cd57a-41a6-4590-a9f5-2f1189d08fef}: There is not enough usable space for this operation.

It seems like there is an issue with the data disks for the HCIBox-Client VM.

You can have a look at the data disks both from the Azure portal:

image

And from within the HCIBox-Client VM:

image

However, it seems like a re-deployment of the HCIBox-instance is the easiest approach in order to get a healthy deployment - so I would suggest to delete the existing one and initiate a new deployment.