Closed ralfbardoel closed 3 years ago
@ralfbardoel there is also a BZ for this issue. Can you provide the logs to the sync pods?
oc get pod -n openshift-node -l app=sync
oc logs <pod>
"lastHeartbeatTime": "2019-03-13T10:18:30Z" "lastTransitionTime": "2019-03-11T11:20:30Z"
That doesn't look right, is node service running there without errors?
Same problem here. OpenShift origin: 3.11 Ansible: 2.9.6 The ansible host resolve each nodes. The prerequisite script works without any issue.
[OSEv3:children]
masters
etcd
nodes
[OSEv3:vars]
## Ansible user who can login to all nodes through SSH (e.g. ssh root@os-master1)
ansible_user=root
## Deployment type: "openshift-enterprise" or "origin"
openshift_deployment_type=origin
deployment_type=origin
## Specifies the major version
openshift_release=v3.11.0
openshift_pkg_version=-3.11.0
openshift_image_tag=v3.11.0
openshift_service_catalog_image_version=v3.11.0
template_service_broker_image_version=v3.11.0
openshift_metrics_image_version="v3.11"
openshift_logging_image_version="v3.11"
openshift_logging_elasticsearch_proxy_image_version="v1.0.0"
osm_use_cockpit=true
openshift_metrics_install_metrics=True
openshift_logging_install_logging=True
## Service address space, /16 = 65,534 IPs
openshift_portal_net=172.30.0.0/16
## Pod address space
osm_cluster_network_cidr=10.128.0.0/14
## Subnet Length of each node, 9 = 510 IPs
osm_host_subnet_length=9
## Master API port
openshift_master_api_port=8443
## Master console port (e.g. https://console.openshift.local:443)
openshift_master_console_port=8443
## Clustering method
openshift_master_cluster_method=native
## Hostname used by nodes and other cluster internals
openshift_master_cluster_hostname=console-int.openshift.home
## Hostname used by platform users
openshift_master_cluster_public_hostname=console.openshift.home
## Application wildcard subdomain
openshift_master_default_subdomain=apps.openshift.home
## identity provider
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
## Users being created in the cluster
## Password abcd1234
openshift_master_htpasswd_users={'admin': '$apr1$BfW0njqt$KbsFn1LKfkb10ARFGxoRX/', 'user1': '$apr1$7erCvbtG$60V7Vx2HBfaDrfG4pUkba.'}
## Persistent storage, NFS
openshift_hosted_registry_storage_kind=nfs
openshift_hosted_registry_storage_access_modes=['ReadWriteMany']
openshift_hosted_registry_storage_host=zion.home
openshift_hosted_registry_storage_nfs_directory=/volume1/SHARED
openshift_hosted_registry_storage_volume_name=registry
openshift_hosted_registry_storage_volume_size=50Gi
## Other vars
containerized=True
os_sdn_network_plugin_name='redhat/openshift-ovs-multitenant'
openshift_disable_check=disk_availability,docker_storage,memory_availability,docker_image_availability
#NFS check bug
openshift_enable_unsupported_configurations=True
#Another Bug 1569476
skip_sanity_checks=true
openshift_node_kubelet_args="{'eviction-hard': ['memory.available<100Mi'], 'minimum-container-ttl-duration': ['10s'], 'maximum-dead-containers-per-container': ['2'], 'maximum-dead-containers': ['5'], 'pods-per-core': ['10'], 'max-pods': ['25'], 'image-gc-high-threshold': ['80'], 'image-gc-low-threshold': ['60']}"
[OSEv3:vars]
[masters]
t4master1.home
[etcd]
t4master1.home
[nodes]
t4master1.home openshift_node_labels="{'region': 'master'}"
t4infra1.home openshift_node_labels="{'region': 'infra'}"
t4node1.home openshift_node_labels="{'region': 'primary'}"
t4node2.home openshift_node_labels="{'region': 'primary'}"
fatal: [t4master1.home]: FAILED! => {
"attempts": 180,
"changed": false,
"invocation": {
"module_args": {
"all_namespaces": null,
"content": null,
"debug": false,
"delete_after": false,
"field_selector": null,
"files": null,
"force": false,
"kind": "node",
"kubeconfig": "/etc/origin/master/admin.kubeconfig",
"name": null,
"namespace": "default",
"selector": "",
"state": "list"
}
},
"module_results": {
"cmd": "/usr/local/bin/oc get node --selector= -o json -n default",
"results": [
{
"apiVersion": "v1",
"items": [
{
"apiVersion": "v1",
"kind": "Node",
"metadata": {
"annotations": {
"volumes.kubernetes.io/controller-managed-attach-detach": "true"
},
"creationTimestamp": "2020-05-16T07:51:40Z",
"labels": {
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"kubernetes.io/hostname": "t4master1"
},
"name": "t4master1",
"namespace": "",
"resourceVersion": "3792",
"selfLink": "/api/v1/nodes/t4master1",
"uid": "13f89076-974a-11ea-838c-5254008ed04b"
},
"spec": {},
"status": {
"addresses": [
{
"address": "192.168.1.223",
"type": "InternalIP"
}, [139/86498]
{
"address": "t4master1",
"type": "Hostname"
}
],
"allocatable": {
"cpu": "4",
"hugepages-2Mi": "0",
"memory": "1779192Ki",
"pods": "250"
},
"capacity": {
"cpu": "4",
"hugepages-2Mi": "0",
"memory": "1881592Ki",
"pods": "250"
},
"conditions": [
{
"lastHeartbeatTime": "2020-05-16T08:26:27Z",
"lastTransitionTime": "2020-05-16T07:51:40Z",
"message": "kubelet has sufficient disk space available",
"reason": "KubeletHasSufficientDisk",
"status": "False",
"type": "OutOfDisk"
},
{
"lastHeartbeatTime": "2020-05-16T08:26:27Z",
"lastTransitionTime": "2020-05-16T07:51:40Z",
"message": "kubelet has sufficient memory available",
"reason": "KubeletHasSufficientMemory",
"status": "False",
"type": "MemoryPressure"
},
{
"lastHeartbeatTime": "2020-05-16T08:26:27Z",
"lastTransitionTime": "2020-05-16T07:51:40Z",
"message": "kubelet has no disk pressure",
"reason": "KubeletHasNoDiskPressure",
"status": "False",
"type": "DiskPressure"
},
{
"lastHeartbeatTime": "2020-05-16T08:26:27Z",
"lastTransitionTime": "2020-05-16T07:51:40Z",
"message": "kubelet has sufficient PID available",
"reason": "KubeletHasSufficientPID",
"status": "False",
"type": "PIDPressure"
},
{
"lastHeartbeatTime": "2020-05-16T08:26:27Z",
"lastTransitionTime": "2020-05-16T07:51:40Z",
"message": "kubelet is posting ready status", [85/86498]
"reason": "KubeletReady",
"status": "True",
"type": "Ready"
}
],
"daemonEndpoints": {
"kubeletEndpoint": {
"Port": 10250
}
},
"images": [
{
"names": [
"docker.io/openshift/origin-node@sha256:73a2fe2f4c9f93efd47bd909572a6592907098ba7b7f2839c3ee9165228b0772",
"docker.io/openshift/origin-node:v3.11.0"
],
"sizeBytes": 1193537132
},
{
"names": [
"docker.io/openshift/origin-control-plane@sha256:8b10156d1e67d326c88228a005a69dcbd211fa1e53b709ad66d8ff1971708c7b",
"docker.io/openshift/origin-control-plane:v3.11.0"
],
"sizeBytes": 835849824
},
{
"names": [
"docker.io/openshift/origin-pod@sha256:3178ea38ef67954ceeb0ad842adcab640019da246aba109226a73aea49f31d54",
"docker.io/openshift/origin-pod:v3.11.0"
],
"sizeBytes": 265514713
},
{
"names": [
"quay.io/coreos/etcd@sha256:ed2b69c34840f475929abd84133e17421d0608b26f9c3cbe54c7699918580a99",
"quay.io/coreos/etcd:v3.2.26"
],
"sizeBytes": 37605387
}
],
"nodeInfo": {
"architecture": "amd64",
"bootID": "a4391a38-6de6-4b66-8ee2-e9d3992b8c07",
"containerRuntimeVersion": "docker://1.13.1",
"kernelVersion": "3.10.0-1062.4.3.el7.x86_64",
"kubeProxyVersion": "v1.11.0+d4cacc0",
"kubeletVersion": "v1.11.0+d4cacc0",
"machineID": "cfd6a6d3aa21425b990bf7fd727c9342",
"operatingSystem": "linux",
"osImage": "CentOS Linux 7 (Core)",
"systemUUID": "CFD6A6D3-AA21-425B-990B-F7FD727C9342"
}
}
}
],
"kind": "List",
"metadata": {
"resourceVersion": "",
"selfLink": ""
}
}
],
"returncode": 0
},
"state": "list"
}
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten /remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen
.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Exclude this issue from closing again by commenting /lifecycle frozen
.
/close
@openshift-bot: Closing this issue.
Description
A deployment of OpenShift origin 3.11 in HA setup fails at the task "wait for sync DS to set annotations on all nodes". Things that are already checked:
We are using Ansible version 2.7.8 and git describe returns "openshift-ansible-3.11.95-1".
Steps To Reproduce
Install HA setup with two masters, two infra nodes and three computing nodes on AWS and the following inventory file (keys and domains are replaced):
Expected Results
OpenShift HA deployment without this errors.
Observed Results
The following error information is return by running ansible with -vvv flags:
https://gist.github.com/ralfbardoel/5923a2a1781a142155f61c08bbd32522
Additional Information
oc version -> v3.11.0+62803d0-1 kubernetes -> v1.11.0+d4cacc0 OpenShift rpm -> centos-release-openshift-origin-1-1.el7.centos.noarch
Running on CentOS 7 (CentOS Linux release 7.6.1810 (Core)) on AWS.