kubernetes / cloud-provider-openstack

Apache License 2.0
616 stars 602 forks source link

after adding cloud config kubelet wont start #169

Closed s0komma closed 5 years ago

s0komma commented 6 years ago

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

/kind bug /kind feature

What happened: Hi Guys, i’m trying to setup k8s persistent storage using cinder. my k8s version is 1.9.4 followed all the steps we added below details on all the nodes of k8s including master and worker --cloud-provider=openstack --cloud-config=/etc/kubernetes/cloud_config --> the added to kubelet, controller and api server on all master nodes and only for kubelet on worker nodes (edited) after doing so kubelet won’t start

Kubelet logs are below May 10 20:10:00 control-plane-256024810-1-359235278 kubelet[12070]: I0510 20:10:00.036796 12070 server.go:305] Successfully initialized cloud provider: "openstack" from the config file: "/etc/kubernetes/cloud_config" May 10 20:10:00 control-plane-256024810-1-359235278 kubelet[12070]: I0510 20:10:00.036907 12070 openstack_instances.go:39] openstack.Instances() called May 10 20:10:00 control-plane-256024810-1-359235278 kubelet[12070]: error: failed to run Kubelet: failed to get instances from cloud provider May 10 20:10:00 control-plane-256024810-1-359235278 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE May 10 20:10:00 control-plane-256024810-1-359235278 systemd[1]: kubelet.service: Unit entered failed state.

tried to see if nova list and cinder list are working from the vm and it does with same credentials

What you expected to happen: Kubelet to come up How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

dims commented 6 years ago

@s0komma what's the name of the vm in openstack server list? use --hostname-override in kubelet and set that to the same name you see in the server list command. Also check your metadata service or config drive to make sure, you are seeing the same host name there too. hopefully you don't have any special characters in the name. Another thing to do is bump up the verbository of the logging --v=10 on kubelet and kube-apiserver to see more messages about any mismatches.

s0komma commented 6 years ago

@dims -- Thanks for responding .. My host name is different from what i see in openstack server list


root@control-plane-256024810-1-359235278:~/skomma# hostname -f
control-plane-256024810-1-359235278.lab.skomma.kubernetes

root@control-plane-256024810-1-359235278:~/skomma# openstack server list | grep skomma
| a586801d-237c-44e6-a883-9194bcac052b | control-plane-lab-skomma-kubernetes-359235272                   | ACTIVE | Primary5_External_Net=10.1.1.1 |```

added below in kubelet
`--hostname-override=control-plane-lab-skomma-kubernetes-359235272 \`

Below are kubelet logs after bumping up the verbository of the logging `--v=10`
`May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.584914   30009 flags.go:52] FLAG: --v="10"
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.584918   30009 flags.go:52] FLAG: --version="false"
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.584926   30009 flags.go:52] FLAG: --vmodule=""
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.584930   30009 flags.go:52] FLAG: --volume-plugin-dir="/usr/libexec/kubernetes/kubelet-plugins/volume/exec/"
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.584936   30009 flags.go:52] FLAG: --volume-stats-agg-period="1m0s"
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.584969   30009 feature_gate.go:226] feature gates: &{{} map[]}
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.584999   30009 controller.go:114] kubelet config controller: starting controller
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.585005   30009 controller.go:118] kubelet config controller: validating combination of defaults and flags
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.607136   30009 mount_linux.go:208] Detected OS with systemd
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.612602   30009 iptables.go:589] couldn't get iptables-restore version; assuming it doesn't support --wait
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.614049   30009 server.go:182] Version: v1.9.4
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.614101   30009 feature_gate.go:226] feature gates: &{{} map[]}
May 11 04:50:22 control-plane-256024810-1-359235278 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=kubelet comm="systemd" exe="/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
May 11 04:50:22 control-plane-256024810-1-359235278 kubelet[24510]: I0511 04:50:22.036864   24510 server.go:305] Successfully initialized cloud provider: "openstack" from the config file: "/etc/kubernetes/cloud_config"
May 11 04:50:22 control-plane-256024810-1-359235278 kubelet[24510]: I0511 04:50:22.036915   24510 openstack_instances.go:39] openstack.Instances() called
May 11 04:50:22 control-plane-256024810-1-359235278 kubelet[24510]: error: failed to run Kubelet: failed to get instances from cloud provider
May 11 04:50:22 control-plane-256024810-1-359235278 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
May 11 04:50:22 control-plane-256024810-1-359235278 systemd[1]: kubelet.service: Unit entered failed state.
May 11 04:50:22 control-plane-256024810-1-359235278 systemd[1]: kubelet.service: Failed with result 'exit-code'.

root@control-plane-256024810-1-359235278:~/skomma# openstack server show a586801d-237c-44e6-a883-9194bcac052b
+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                                | Value                                                                                                                                                                                                                                   |
+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                                                                                                                                                                                                  |
| OS-EXT-AZ:availability_zone          | az2                                                                                                                                                                                                                                     |
| OS-EXT-SRV-ATTR:host                 | xxxxx                                                                                                                                                                                                                       |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | xxxxxxcom                                                                                                                                                                                                       |
| OS-EXT-SRV-ATTR:instance_name        | instance-0002c8c5                                                                                                                                                                                                                       |
| OS-EXT-STS:power_state               | 1                                                                                                                                                                                                                                       |
| OS-EXT-STS:task_state                | None                                                                                                                                                                                                                                    |
| OS-EXT-STS:vm_state                  | active                                                                                                                                                                                                                                  |
| OS-SRV-USG:launched_at               | 2018-05-08T18:01:43.000000                                                                                                                                                                                                              |
| OS-SRV-USG:terminated_at             | None                                                                                                                                                                                                                                    |
| accessIPv4                           |                                                                                                                                                                                                                                         |
| accessIPv6                           |                                                                                                                                                                                                                                         |
| addresses                            | Primary5_External_Net=10.1q.q.q                                                                                                                                                                                                     |
| config_drive                         | True                                                                                                                                                                                                                                    |
| created                              | 2018-05-08T18:01:32Z                                                                                                                                                                                                                    |
| flavor                               | m1.medium (3)                                                                                                                                                                                                                           |
| hostId                               | 9e7065dcf87822477a26518116ff2f5f5d92b6e5c4536dff8e26bea1                                                                                                                                                                                |
| id                                   | a586801d-237c-44e6-a883-9194bcac052b                                                                                                                                                                                                    |
| image                                | Ubuntu-16.04-minimal-cloud-init RC10 (4fbebc68-9853-4c0a-9a41-e2aa1afaf7d4)                                                                                                                                                             |
| key_name                             | oneops_key-359232257-lab-359232294                                                                                                                                                                                                      |
| name                                 | control-plane-lab-skomma-kubernetes-359235272                                                                                                                                                                                           |
| os-extended-volumes:volumes_attached | []                                                                                                                                                                                                                                      |
| progress                             | 0                                                                                                                                                                                                                                       |
| project_id                           | b938d5464ece4d02876f9ec8db110bf7                                                                                                                                                                                                        |
| properties                           | assembly='skomma', component='359232309', environment='lab', instance='359235272', mgmt_url='https://web..com', organization='kubernetes', owner='', platform='control-plane' |
| security_groups                      | [{u'name': u'default'}, {u'name': u'control-plane-lab-skomma-kubernetes-359235182'}]                                                                                                                                                    |
| status                               | ACTIVE                                                                                                                                                                                                                                  |
| updated                              | 2018-05-08T18:01:43Z                                                                                                                                                                                                                    |
| user_id                              | xxxxxxxxx                                                                                                                                                                                                        |
+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+`
s0komma commented 6 years ago

This the error i se in the end May 11 05:00:04 control-plane-256024810-1-359235278 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=kubelet comm="systemd" exe="/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'

dims commented 6 years ago

@s0komma

s0komma commented 6 years ago

@dims -- Is the keystone url accessible from this VM? -- Yes i'm doing nova list and others from the Vm only

If you print the keystone catalog, you will see the url for Nova, is the hostname used in the url resolvable? -- I know for sure that the name in OpenStack is different from the name of VM and the OpenStack name is not dns resolvable.. Should the name in nova list be the same as the VM or just DNS entry is good .?

dims commented 6 years ago

of you are running openstack CLI and Nova CLI from the vm, then you should be ok in terms of the VM being able to reach the Nova and Keystone API endpoints.

the name of the vm in the nova list should be used as the --hostname-override when you start kubelet. Have you done that already?

s0komma commented 6 years ago

Thank you @dims

the name of the vm in the nova list should be used as the --hostname-override when you start kubelet. Have you done that already? -- YES


root@control-plane-256024810-1-359235278:~/skomma# hostname -f
control-plane-256024810-1-359235278.lab.skomma.kubernetes

root@control-plane-256024810-1-359235278:~/skomma# openstack server list | grep skomma
| a586801d-237c-44e6-a883-9194bcac052b | control-plane-lab-skomma-kubernetes-359235272                   | ACTIVE | Primary5_External_Net=10.1.1.1 |```

added below in kubelet
`--hostname-override=control-plane-lab-skomma-kubernetes-359235272 \`

Below are kubelet logs after bumping up the verbository of the logging `--v=10`
`May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.584914   30009 flags.go:52] FLAG: --v="10"
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.584918   30009 flags.go:52] FLAG: --version="false"
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.584926   30009 flags.go:52] FLAG: --vmodule=""
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.584930   30009 flags.go:52] FLAG: --volume-plugin-dir="/usr/libexec/kubernetes/kubelet-plugins/volume/exec/"
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.584936   30009 flags.go:52] FLAG: --volume-stats-agg-period="1m0s"
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.584969   30009 feature_gate.go:226] feature gates: &{{} map[]}
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.584999   30009 controller.go:114] kubelet config controller: starting controller
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.585005   30009 controller.go:118] kubelet config controller: validating combination of defaults and flags
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.607136   30009 mount_linux.go:208] Detected OS with systemd
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.612602   30009 iptables.go:589] couldn't get iptables-restore version; assuming it doesn't support --wait
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.614049   30009 server.go:182] Version: v1.9.4
May 11 02:09:26 control-plane-256024810-1-359235278 kubelet[30009]: I0511 02:09:26.614101   30009 feature_gate.go:226] feature gates: &{{} map[]}
May 11 04:50:22 control-plane-256024810-1-359235278 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=kubelet comm="systemd" exe="/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
May 11 04:50:22 control-plane-256024810-1-359235278 kubelet[24510]: I0511 04:50:22.036864   24510 server.go:305] Successfully initialized cloud provider: "openstack" from the config file: "/etc/kubernetes/cloud_config"
May 11 04:50:22 control-plane-256024810-1-359235278 kubelet[24510]: I0511 04:50:22.036915   24510 openstack_instances.go:39] openstack.Instances() called
May 11 04:50:22 control-plane-256024810-1-359235278 kubelet[24510]: error: failed to run Kubelet: failed to get instances from cloud provider
May 11 04:50:22 control-plane-256024810-1-359235278 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
May 11 04:50:22 control-plane-256024810-1-359235278 systemd[1]: kubelet.service: Unit entered failed state.
May 11 04:50:22 control-plane-256024810-1-359235278 systemd[1]: kubelet.service: Failed with result 'exit-code'.

root@control-plane-256024810-1-359235278:~/skomma# openstack server show a586801d-237c-44e6-a883-9194bcac052b
+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                                | Value                                                                                                                                                                                                                                   |
+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                                                                                                                                                                                                  |
| OS-EXT-AZ:availability_zone          | az2                                                                                                                                                                                                                                     |
| OS-EXT-SRV-ATTR:host                 | xxxxx                                                                                                                                                                                                                       |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | xxxxxxcom                                                                                                                                                                                                       |
| OS-EXT-SRV-ATTR:instance_name        | instance-0002c8c5                                                                                                                                                                                                                       |
| OS-EXT-STS:power_state               | 1                                                                                                                                                                                                                                       |
| OS-EXT-STS:task_state                | None                                                                                                                                                                                                                                    |
| OS-EXT-STS:vm_state                  | active                                                                                                                                                                                                                                  |
| OS-SRV-USG:launched_at               | 2018-05-08T18:01:43.000000                                                                                                                                                                                                              |
| OS-SRV-USG:terminated_at             | None                                                                                                                                                                                                                                    |
| accessIPv4                           |                                                                                                                                                                                                                                         |
| accessIPv6                           |                                                                                                                                                                                                                                         |
| addresses                            | Primary5_External_Net=10.1q.q.q                                                                                                                                                                                                     |
| config_drive                         | True                                                                                                                                                                                                                                    |
| created                              | 2018-05-08T18:01:32Z                                                                                                                                                                                                                    |
| flavor                               | m1.medium (3)                                                                                                                                                                                                                           |
| hostId                               | 9e7065dcf87822477a26518116ff2f5f5d92b6e5c4536dff8e26bea1                                                                                                                                                                                |
| id                                   | a586801d-237c-44e6-a883-9194bcac052b                                                                                                                                                                                                    |
| image                                | Ubuntu-16.04-minimal-cloud-init RC10 (4fbebc68-9853-4c0a-9a41-e2aa1afaf7d4)                                                                                                                                                             |
| key_name                             | oneops_key-359232257-lab-359232294                                                                                                                                                                                                      |
| name                                 | control-plane-lab-skomma-kubernetes-359235272                                                                                                                                                                                           |
| os-extended-volumes:volumes_attached | []                                                                                                                                                                                                                                      |
| progress                             | 0                                                                                                                                                                                                                                       |
| project_id                           | b938d5464ece4d02876f9ec8db110bf7                                                                                                                                                                                                        |
| properties                           | assembly='skomma', component='359232309', environment='lab', instance='359235272', mgmt_url='https://web..com', organization='kubernetes', owner='', platform='control-plane' |
| security_groups                      | [{u'name': u'default'}, {u'name': u'control-plane-lab-skomma-kubernetes-359235182'}]                                                                                                                                                    |
| status                               | ACTIVE                                                                                                                                                                                                                                  |
| updated                              | 2018-05-08T18:01:43Z                                                                                                                                                                                                                    |
| user_id                              | xxxxxxxxx                                                                                                                                                                                                        |
+--------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+`
dims commented 6 years ago

so let's look at the code now, we can see the openstack.Instances() called printed at the following line: https://github.com/kubernetes/kubernetes/blob/release-1.9/pkg/cloudprovider/providers/openstack/openstack_instances.go#L39

but not the Claiming to support Instances from line 46: https://github.com/kubernetes/kubernetes/blob/release-1.9/pkg/cloudprovider/providers/openstack/openstack_instances.go#L46

Unfortunately we are missing printing the actual err we get from os.NewComputeV2()

So basically this points to something missing or problematic in your cloud config file. Are you able to make a local change to the code and try digging in?

s0komma commented 6 years ago

This is what i see in logs

Are you able to make a local change to the code and try digging in? -- Yes, would be of great help with you guide me through this


May 11 16:41:37 control-plane-256024810-1-359235278 kubelet[32257]: I0511 16:41:37.739591   32257 openstack_instances.go:39] openstack.Instances() called
May 11 16:41:37 control-plane-256024810-1-359235278 kubelet[32257]: error: failed to run Kubelet: failed to get instances from cloud provider
May 11 16:41:37 control-plane-256024810-1-359235278 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=kubelet comm="systemd" exe="/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
May 11 16:41:37 control-plane-256024810-1-359235278 audispd[585]: node=control-plane-256024810-1-359235278 type=SERVICE_STOP msg=audit(1526056897.740:421506): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=kubelet comm="systemd" exe="/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'`
s0komma commented 6 years ago

below is what i see /etc/cloud/


root@control-plane-256024810-1-359235278:~# cat /etc/cloud/cloud.cfg
preserve_hostname: true
root@control-plane-256024810-1-359235278:~# cat /etc/cloud/cloud.cfg.d/
05_logging.cfg   90_dpkg.cfg      99_hostname.cfg  README
root@control-plane-256024810-1-359235278:~# cat /etc/cloud/cloud.cfg.d/99_hostname.cfg
hostname: control-plane-256024810-1-359235278
fqdn: control-plane-256024810-1-359235278.lab.skomma.kubernetes.com
dims commented 6 years ago

so the cloud config file is the file that you pass to the kubelet or apiserver using --cloud-config parameter. can you please cross check what's in there?

on the code change, just add new glog.Infof() statement and print the err message.

s0komma commented 6 years ago

Below is the cloud_config file passed to kubelet


root@control-plane-256024810-1-359235278:/home/app# cat /etc/kubernetes/cloud_config
[Global]
region=region
username=kubernetes
password=******
auth-url=https://api-endpoint.com:5000/v3
tenant-id=********
domain-name=Default
s0komma commented 6 years ago

@dims -- since i'm using in-tree version as of now, do you suggest to try the same test with CSI so that there will be more flexibility on tweaking things

dims commented 6 years ago

@s0komma you are using 1.9.x, so it's better to stick to that version. the CSI stuff is very much in progress and may not work with older releases

s0komma commented 6 years ago

@dims -- Thank you very much i have updated the kubelet with you


May 11 18:15:18 control-plane-256024810-1-359235278 kubelet[1789]: I0511 18:15:18.229454    1789 server.go:305] Successfully initialized cloud provider: "openstack" from the config file: "/etc/kubernetes/cloud_config"
May 11 18:15:18 control-plane-256024810-1-359235278 kubelet[1789]: I0511 18:15:18.229496    1789 openstack_instances.go:39] openstack.Instances() called
May 11 18:15:18 control-plane-256024810-1-359235278 kubelet[1789]: E0511 18:15:18.229512    1789 openstack_instances.go:43] unable to access compute v2 API : failed to find compute v2 endpoint for region ndc5: No suitable endpoint could be found in the service catalog.
May 11 18:15:18 control-plane-256024810-1-359235278 kubelet[1789]: error: failed to run Kubelet: failed to get instances from cloud provider
May 11 18:15:18 control-plane-256024810-1-359235278 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
May 11 18:15:18 control-plane-256024810-1-359235278 systemd[1]: kubelet.service: Unit entered failed state.
dims commented 6 years ago

@s0komma - so i have no idea how your openstack environment is set up. please review the code here i think this is where your error is coming from. cross check your entries from openstack service list for the specific region? and look at the code below.

https://github.com/kubernetes/kubernetes/blob/release-1.9/vendor/github.com/gophercloud/gophercloud/openstack/endpoint_location.go#L19-L59

s0komma commented 6 years ago

@dims -- few updated, i know that the user account i have doesnt have access to do openstack endpoint list and openstack service list so i'm try to get that access as well added to the user

s0komma commented 6 years ago

@dims -- now able to run below commands as kubernetes user as mentioned in my cloud-config file


root@control-plane-256024810-1-359235278:~/# openstack catalog list
+-----------+-----------+------------------------------------------------------------------------------------------------------+
| Name      | Type      | Endpoints                                                                                            |
+-----------+-----------+------------------------------------------------------------------------------------------------------+
| cinder    | volume    | RegionOne                                                                                            |
|           |           |   public: https://api-endpoint.prod.walmart.com:8776/v1/xxxxxx     |
|           |           | RegionOne                                                                                            |
|           |           |   admin: https://api-endpoint.prod.walmart.com:8776/v1/xxxxxx      |
|           |           | RegionOne                                                                                            |
|           |           |   internal: https://api-endpoint.prod.walmart.com:8776/v1/xxxxx   |
|           |           |                                                                                                      |
| placement | placement | RegionOne                                                                                            |
|           |           |   public: https://api-endpoint.prod.walmart.com:8780/placement                               |
|           |           | RegionOne                                                                                            |
|           |           |   internal: https://api-endpoint.prod.walmart.com:8780/placement                             |
|           |           | RegionOne                                                                                            |
|           |           |   admin: https://api-endpoint.prod.walmart.com:8780/placement                                |
|           |           |                                                                                                      |
| nova      | compute   | RegionOne                                                                                            |
|           |           |   admin: https://api-endpoint.prod.walmart.com:8774/v2.1/xxxxx    |
|           |           | RegionOne                                                                                            |
|           |           |   public: https://api-endpoint.prod.walmart.com:8774/v2.1/xxxx   |
|           |           | RegionOne                                                                                            |
|           |           |   internal: https://api-endpoint.prod.walmart.com:8774/v2.1/xxxxx |
|           |           |                                                                                                      |
| neutron   | network   | RegionOne                                                                                            |
|           |           |   admin: https://api-endpoint.prod.walmart.com:9696                                          |
|           |           | RegionOne                                                                                            |
|           |           |   public: https://api-endpoint.prod.walmart.com:9696                                         |
|           |           | RegionOne                                                                                            |
|           |           |   internal: https://api-endpoint.prod.walmart.com:9696                                       |
|           |           |                                                                                                      |
| cinderv2  | volumev2  | RegionOne                                                                                            |
|           |           |   internal: https://api-endpoint.prod.walmart.com:8776/v2/xxxxx   |
|           |           | RegionOne                                                                                            |
|           |           |   admin: https://api-endpoint.prod.walmart.com:8776/v2/xxxx      |
|           |           | RegionOne                                                                                            |
|           |           |   public: https://api-endpoint.prod.walmart.com:8776/v2/xxxx     |
|           |           |                                                                                                      |
| keystone  | identity  | RegionOne                                                                                            |
|           |           |   internal: https://api-endpoint.prod.walmart.com:5000/v3                                    |
|           |           | RegionOne                                                                                            |
|           |           |   admin: https://api-endpoint.prod.walmart.com:35357/v3                                      |
|           |           | RegionOne                                                                                            |
|           |           |   public: https://api-endpoint.prod.walmart.com:5000/v3                                      |
|           |           |                                                                                                      |
| cinderv3  | volumev3  | RegionOne                                                                                            |
|           |           |   admin: https://api-endpoint.prod.walmart.com:8776/v3/xxxx      |
|           |           | RegionOne                                                                                            |
|           |           |   internal: https://api-endpoint.prod.walmart.com:8776/v3/xxxx   |
|           |           | RegionOne                                                                                            |
|           |           |   public: https://api-endpoint.prod.walmart.com:8776/v3/xxxx     |
|           |           |                                                                                                      |
| glance    | image     | RegionOne                                                                                            |
|           |           |   public: https://api-endpoint.prod.walmart.com:9292                                         |
|           |           | RegionOne                                                                                            |
|           |           |   admin: https://api-endpoint.prod.walmart.com:9292                                          |
|           |           | RegionOne                                                                                            |
|           |           |   internal: https://api-endpoint.prod.walmart.com:9292                                       |
|           |           |                                                                                                      |
+-----------+-----------+------------------------------------------------------------------------------------------------------+
root@control-plane-256024810-1-359235278:~/# openstack service list
+----------------------------------+-----------+-----------+
| ID                               | Name      | Type      |
+----------------------------------+-----------+-----------+
| 1234 | cinder    | volume    |
| 1234 | placement | placement |
| 1234 | nova      | compute   |
| 1234 | neutron   | network   |
| 1234 | cinderv2  | volumev2  |
| 1234 | keystone  | identity  |
| 1234 | cinderv3  | volumev3  |
| 1234 | glance    | image     |
s0komma commented 6 years ago

all of the above api urls i'm able to reach from VM.

dims commented 6 years ago

@s0komma - the region in the catalog seems to be RegionOne and you have region=region in your /etc/kubernetes/cloud_config file

s0komma commented 6 years ago

@dims

Thank you very much for pointing it out, looks like we have 2 regions OS_REGION= region and NOVA_REGION_NAME=RegionOne. after keeping the region name and using --hostname-override kubelet is up and running now. Below are the logs for it


May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.349666   10537 kuberuntime_manager.go:571] computePodActions got {KillPod:false CreateSandbox:false SandboxID:aa1c4985578b8377bd731a06e2e72884673aef446a737e05b1f2fb0533ff860e Attempt:0 NextInitContainerToStart:nil ContainersToStart:[] ContainersToKill:map[]} for pod "kube-controller-manager-control-plane-lab-skomma-kubernetes-359235272_kube-system(9fbec4125504e548fe3cd42707da48c4)"
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.443587   10537 config.go:99] Looking for [api file], have seen map[file:{}]
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.463451   10537 openstack_instances.go:39] openstack.Instances() called
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.463500   10537 openstack_instances.go:47] Claiming to support Instances
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.493328   10537 openstack_instances.go:39] openstack.Instances() called
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.493356   10537 openstack_instances.go:47] Claiming to support Instances
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.518508   10537 openstack_instances.go:39] openstack.Instances() called
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.518886   10537 openstack_instances.go:47] Claiming to support Instances
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.543647   10537 config.go:99] Looking for [api file], have seen map[file:{}]
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.643617   10537 config.go:99] Looking for [api file], have seen map[file:{}]
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.743764   10537 config.go:99] Looking for [api file], have seen map[file:{}]
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.843550   10537 config.go:99] Looking for [api file], have seen map[file:{}]
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.923502   10537 generic.go:183] GenericPLEG: Relisting
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.928719   10537 kubelet_pods.go:1369] Generating status for "kube-controller-manager-control-plane-lab-skomma-kubernetes-359235272_kube-system(9fbec4125504e548fe3cd42707da48c4)"
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.928822   10537 kubelet_node_status.go:273] Setting node annotation to enable volume controller attach/detach
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.928833   10537 openstack_instances.go:39] openstack.Instances() called
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.928844   10537 openstack_instances.go:47] Claiming to support Instances
May 12 02:08:03 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:03.943623   10537 config.go:99] Looking for [api file], have seen map[file:{}]
May 12 02:08:04 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:04.043717   10537 config.go:99] Looking for [api file], have seen map[file:{}]
May 12 02:08:04 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:04.132019   10537 kubelet_node_status.go:329] Adding node label from cloud provider: beta.kubernetes.io/instance-type=3
May 12 02:08:04 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:04.132064   10537 openstack.go:573] Claiming to support Zones
May 12 02:08:04 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:04.132074   10537 openstack.go:587] Current zone is {az2 RegionOne}
May 12 02:08:04 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:04.132101   10537 kubelet_node_status.go:340] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=az2
May 12 02:08:04 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:04.132114   10537 kubelet_node_status.go:344] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=RegionOne
May 12 02:08:04 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:04.132126   10537 openstack_instances.go:39] openstack.Instances() called
May 12 02:08:04 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:04.132136   10537 openstack_instances.go:47] Claiming to support Instances
May 12 02:08:04 control-plane-256024810-1-359235278 kubelet[10537]: I0512 02:08:04.132140   10537 openstack_instances.go:70] NodeAddresses(control-plane-lab-skomma-kubernetes-359235272) called
s0komma commented 6 years ago

@dims the kubelet is up, then noticed that api-server is not coming up then noticed below error --external-hostname was not specified. Trying to get it from the cloud provider so added --external-hostname=control-plane-lab-skomma-kubernetes-359235272 as this is the name i see in openstack server list and then API server came up and controller is also up and running. but still the nodes status remains in NotReady. Upon looking in to logs i see below msg. Any guidance regarding this would be great


May 12 02:30:17 control-plane-256024810-1-359235278 kubelet[18650]: E0512 02:30:17.448773   18650 kubelet_node_status.go:106] Unable to register node "control-plane-lab-skomma-kubernetes-359235272" with API server: nodes "control-plane-lab-skomma-kubernetes-359235272" is forbidden: node "control-plane-256024810-1-359235278" cannot modify node "control-plane-lab-skomma-kubernetes-359235272"
liggitt commented 6 years ago

If you already bootstrapped the node, and it requested/created a client certificate for its original name, then you changed the name of the node (via cloud provider changes), you should delete the now-mismatched --kubeconfig file containing the credentials for the original node name and let it re-bootstrap

s0komma commented 6 years ago

@liggitt -- Thanks for that details, let me re-bootstrap with the name found under cloud provider and see what happens..

s0komma commented 6 years ago

@liggitt & @dims -- i have few specific questions on how this would work really.

if we re-bootstrap the node and use the name thats instance name in openstack hostname-1, then kubectl get nodes would see the node as hostname-1. whats the best way to go over it .?

any guidance for this would be great

liggitt commented 6 years ago

@dims it appears incompatible changes were made to the openstack cloud provider in 1.10 in https://github.com/kubernetes/kubernetes/pull/58502

previously, instance name was used as the node name. That PR changed it to hostname (and a later PR made additional changes in https://github.com/kubernetes/kubernetes/pull/61000). The intent of those PRs was to ensure the attribute used was always a valid node name, but an unintended side effect was that it broke setups that named their instances intentionally and were previously working.

It is fine to have an option to alter the attribute used, but changes should be backwards compatible and not break pre-1.10 deployments.

Options I could see:

dims commented 6 years ago

@liggitt right. it's being tracked here - https://github.com/kubernetes/kubernetes/issues/62295

Also, @s0komma is in 1.9.x so he has not hot the problem in 1.10 yet :)

s0komma commented 6 years ago

The version i'm currently using is 1.9.4

@liggitt & @dims -- wanted to update you guys on what i have done so far

As suggested i tried to re-bootstrap an worker node

Note: i did not have to use --hostname-override=hostname-1 in kubelet since i was registering the node with openstack server name

so now the question is when we are building kubelet and registering the sever to control plane we want it to be registered using the actual host name, so that when we do kubectl get nodes we actually see it as host-1-21 ( host name ) instead of hostname-1 ( openstack instance name )

s0komma commented 6 years ago

@liggitt & @dims -- any thought on the above point .?

liggitt commented 6 years ago

when we are building kubelet and registering the sever to control plane we want it to be registered using the actual host name, so that when we do kubectl get nodes we actually see it as host-1-21 ( host name ) instead of hostname-1 ( openstack instance name )

I believe the only way to do that with the openstack cloud provider in 1.9 is to make your openstack instance name identical to the openstack hostname

s0komma commented 6 years ago

@liggitt -- Thank you for the update, is this something thats an option in future version .?

fejta-bot commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 6 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 5 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 5 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/cloud-provider-openstack/issues/169#issuecomment-429582289): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.