kubermatic / kubeone

Kubermatic KubeOne automate cluster operations on all your cloud, on-prem, edge, and IoT environments.
https://kubeone.io
Apache License 2.0
1.38k stars 234 forks source link

Azure ingress is not accessible via Load Balancer #1307

Closed dharapvj closed 2 years ago

dharapvj commented 3 years ago

What happened: When I install nginx-ingress-controller on kubeone based cluster in Azure, the ingress URLs timeout.

What is the expected behavior: Ingress based app access should work.

How to reproduce the issue:

Anything else we need to know? I observed that if I

  1. Create another availability set in terraform
  2. move the worker node to different availability set
  3. change the primaryAvailabilitySetName to this new avset in kubeone.yaml ingress becomes available. But I do not know reason for this behavior. Also I am not sure the impact of changing value of primaryAvailabilitySetName in kubeone.yaml

Information about the environment: KubeOne version (kubeone version):

{
  "kubeone": {
    "major": "1",
    "minor": "2",
    "gitVersion": "1.2.0-rc.1",
    "gitCommit": "fde1f267769e04acc07bbdb94c09b0d6a18ea4cc",
    "gitTreeState": "",
    "buildDate": "2021-03-12T10:25:53Z",
    "goVersion": "go1.16.1",
    "compiler": "gc",
    "platform": "linux/amd64"
  },
  "machine_controller": {
    "major": "1",
    "minor": "25",
    "gitVersion": "v1.25.0",
    "gitCommit": "",
    "gitTreeState": "",
    "buildDate": "",
    "goVersion": "",
    "compiler": "",
    "platform": "linux/amd64"
  }
}

Operating system: linux Provider you're deploying cluster on: Azure Operating system you're deploying on: Ubuntu

JoaquimFreitas commented 3 years ago

Hi, The root cause for that behaviour - the provisioning of the LoadBalancer for the Ingress Controller fails - is due to the fact that the KubeOne example Terraform fro Azure creates a Azure LoadBalancer with Basic SKU, and with Basic SKU an Availability Set can ONLY belong to ONE LoadBalancer Backend Pool, and that's what caused the provisioning of the 2nd LB to fail.

A separate AvailabilitySet must be used for the Workers on Azure.

I've changed the Azure Terraform project to do just that, and also changed the SKU of the LB and Public IPs to Standard, as the Basic LB is very limited in functionality and performance.

@dharapvj
Can you provide here the KubeOne Manifest that you are testing for Azure?

I'm getting some strange errors during the KubeOne provisioning, mainly:

INFO[23:18:07 BST] Running kubeadm...                            node=##.##.###.###
WARN[23:23:31 BST] Task failed, error was: + export PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/sbin:/usr/local/bin:/opt/bin
+ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/sbin:/usr/local/bin:/opt/bin
+ [[ -f /etc/kubernetes/admin.conf ]]
+ sudo kubeadm init --config=./kubeone/cfg/master_0.yaml --ignore-preflight-errors=DirAvailable--var-lib-etcd,ImagePull
W0427 22:18:08.506231    4817 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.96.0.10]; the provided value is: [169.254.20.10]
W0427 22:18:08.510472    4817 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
: Process exited with status 1
dharapvj commented 3 years ago

Hi @JoaquimFreitas

kubeone.yaml

apiVersion: kubeone.io/v1beta1
kind: KubeOneCluster
versions:
  kubernetes: '1.17.17'
cloudProvider:
  azure: {}

credentials.yaml

Please note that here I am referring second availability set vj1-avset-worker instead of original avset vj1-avset as mentioned in my workaround.

ARM_TENANT_ID: "XXX"
ARM_CLIENT_ID: "ZZZ"
ARM_SUBSCRIPTION_ID: "YYY"
ARM_CLIENT_SECRET: "AAA"
cloudConfig: |
    {
      "tenantId": "XXX",
      "subscriptionId": "YYY",
      "aadClientId": "ZZZ",
      "aadClientSecret": "AAA",
      "resourceGroup": "vj1-rg",
      "location": "westeurope",
      "subnetName": "vj1-subnet",
      "routeTableName": "",
      "securityGroupName": "vj1-sg",
      "vnetName": "vj1-vpc",
      "primaryAvailabilitySetName": "vj1-avset-worker", 
      "useInstanceMetadata": true,
      "useManagedIdentityExtension": false,
      "userAssignedIdentityID": ""
    }
JoaquimFreitas commented 3 years ago

@dharapvj Thanks for the info.

I was not using a credentials.yaml, added it to my kubeone command.

Went back to Basic SKU LB in the Azure Terraform project - I was getting some strange errors during the kubeone execution with Standard SKU LB - and added the creation of a new AvailabilitySet for the Workers, and added a new NSG InboundRule for the KubeAPI - kubeone was falling later on during the provisioning due to that.

Also added new variables in the Terraform project to allow use of custom vNet and Subnet (the values were hardcoded in the Terraform code, with no need for that).

Also altered the Terraforn Output to indicate the new Workers AvailabilitySet in the cloudConfig.

apiVersion: kubeone.io/v1beta1
kind: KubeOneCluster
name: kubeone-k8scluster-azure

versions:
  kubernetes: '1.19.9'

cloudProvider:
  azure: {}
  cloudConfig: |
    {
      "tenantId": "TTTTTTT-ID",
      "subscriptionId": "SSSSSSS-ID",
      "aadClientId": "CCCCCCC-ID",
      "aadClientSecret": "CSECCSEC-ID",
      "resourceGroup": "kubeone-k8scluster-azure-rg",
      "location": "northeurope",
      "subnetName": "kubeone-k8scluster-azure-subnet",
      "routeTableName": "",
      "securityGroupName": "kubeone-k8scluster-azure-nsg",
      "vnetName": "kubeone-k8scluster-azure-vnet",
      "primaryAvailabilitySetName": "kubeone-k8scluster-azure-pool1-avset", 
      "useInstanceMetadata": true,
      "useManagedIdentityExtension": false,
      "userAssignedIdentityID": ""
    }

containerRuntime:
  containerd: {}

Let me say that I thing that kubeone ONLY takes into account the output.tf of the Terraform project, I've tested to put some different information of the cloudConfigsettings and they seemed to be just ignored...

With the changes, the KubeOne provisioning goes without major issues. And now a Azure LoadBalancer and a Public IP for it are provisioned when a Ingress Controller is deployed.

ONLY one question remains... What happens if MORE than ONE Ingress Controller is deployed....

kubermatic-bot commented 3 years ago

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kubermatic-bot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

kron4eg commented 3 years ago

/remove-lifecycle rotten

kubermatic-bot commented 2 years ago

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

xmudrii commented 2 years ago

This was fixed. /close

kubermatic-bot commented 2 years ago

@xmudrii: Closing this issue.

In response to [this](https://github.com/kubermatic/kubeone/issues/1307#issuecomment-996207577): >This was fixed. >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.