microsoft / hcsshim

Windows - Host Compute Service Shim
MIT License
579 stars 259 forks source link

connectex: A socket operation was attempted to an unreachable error AND HNS Unspecified error #108

Open m-kostrzewa opened 7 years ago

m-kostrzewa commented 7 years ago

Issue seems similar to https://github.com/Microsoft/hcsshim/issues/95

There are two test cases and two different errors, but I think the underlying root cause may be the same, that's why I report both errors in the same issue.

Here's the first test case:

for i := 0; i < numTries; i++ {
    subnets := []hcsshim.Subnet{
        {
            AddressPrefix:  "10.0.0.0/24",
            GatewayAddress: "10.0.0.1",
        },
    }
    configuration := &hcsshim.HNSNetwork{
        Type:               "transparent",
        NetworkAdapterName: "Ethernet0",
        Subnets:            subnets,
    }
    configBytes, _ := json.Marshal(configuration)
    resp, err := hcsshim.HNSNetworkRequest("POST", "", string(configBytes))
    Expect(err).ToNot(HaveOccurred())

    _, err = net.Dial("tcp", "10.7.0.54:8082")
    Expect(err).ToNot(HaveOccurred())
    // sometimes errors:
    // `dial tcp localhost:80: connectex: A socket operation was attempted to an
    // unreachable network.`

    hcsshim.HNSNetworkRequest("DELETE", resp.Id, "")
}

net.Dial sometimes fails with:

dial tcp 10.7.0.54:8082: connectex: A socket operation was attempted to an unreachable network.

Powershell script that kinda replicates it (but prints a different error):

1..50 | % { New-ContainerNetwork -Mode transparent -Name net1 -SubnetPrefix 10.0.0.0/24 -NetworkAdaptername Ethernet0; curl 10.7.0.54:8082; Remove-ContainerNetwork -Name net1 -Force; }

I sometimes get error:

curl : Unable to connect to remote server

which I believe to be a coarse error message encompassing aforementioned connectex... error.

Here's the second test case, which is very similar to the first one but we don't specify a subnet when creating the HNS network:

for i := 0; i < numTries; i++ {
    configuration := &hcsshim.HNSNetwork{
        Type:               "transparent",
        NetworkAdapterName: "Ethernet0",
    }
    configBytes, _ := json.Marshal(configuration)
    resp, err := hcsshim.HNSNetworkRequest("POST", "", string(configBytes))
    Expect(err).ToNot(HaveOccurred())
    // sometimes errors:
    // `HNS failed with error : Unspecified error`

    hcsshim.HNSNetworkRequest("DELETE", resp.Id, "")
}

This time, we get a HNS error when invoking POST request on HNS:

HNS failed with error : Unspecified error

To replicate via powershell:

1..50 | % { New-ContainerNetwork -Mode transparent -Name net1 -NetworkAdaptername Ethernet0; Remove-ContainerNetwork -Name net1 -Force; }

Which sometimes returns:

New-ContainerNetwork : Unspecified Error
m-kostrzewa commented 7 years ago

The workaround is to sleep for 2 seconds after POST request, but this is unacceptable.

m-kostrzewa commented 7 years ago

When creating a new vswitch via hyper-v manager, it informs you that connectivity may break. I understand similar thing is happening here: when HNS switch is being created, HNS network creation returns success, even though the system is not done dealing with all the configuration changes of vswitch and net adapters.

But as a user of HNS, I care about when the network is ready, not when it is just created. I think HNS should block until everything is up and running. If that is not possible, then I would like to know some workaround, like polling network adapters or something, so that I don't have to use Sleeps in the code.

msabansal commented 7 years ago

@m-kostrzewa In our test scripts we do a similar thing to wait of connectivity before invoking new-containernetwork. Example is

function GetConnectedHostIP() { $connectedNics=$x=Get-NetAdapter | ?{$.Status -eq "Up"} | % {$.Name} Get-NetIPAddress | ? {$connectedNics.Contains($.InterfaceAlias) -and $.AddressFamily -eq "IPv4" -and !$.InterfaceAlias.Contains("HNS Internal NIC") -and !$.IPAddress.StartsWith("169") -and !$_.IPAddress.StartsWith("127")} }

function WaitForHostConnectivity {

for ($i=0;$i -lt 20;$i++)
{
        $ip=GetConnectedHostIP
        if ($ip -ne $null)
        {
            return $true
        }
        sleep -Milliseconds 300
}

return $false

} WaitForHostConnectivity