openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.5k stars 4.71k forks source link

webconsole unavailable after install #19143

Closed bparees closed 6 years ago

bparees commented 6 years ago

Several of our extended test jobs failed to install/stand up due to the console being unavailable:

FAILED - RETRYING: Verify that the web console is running (5 retries left).
FAILED - RETRYING: Verify that the web console is running (4 retries left).
FAILED - RETRYING: Verify that the web console is running (3 retries left).
FAILED - RETRYING: Verify that the web console is running (2 retries left).
FAILED - RETRYING: Verify that the web console is running (1 retries left).
fatal: [localhost]: FAILED! => {

https://ci.openshift.redhat.com/jenkins/job/test_branch_origin_extended_builds/418/console https://ci.openshift.redhat.com/jenkins/job/test_branch_origin_extended_image_ecosystem/431/console

@spadgett @stevekuznetsov

spadgett commented 6 years ago

It can't pull the image:

Back-off pulling image "openshift/origin-web-console:51e2775"

...

Error from server (BadRequest): container "webconsole" in pod "webconsole-56d6b94669-rtp6w" is waiting to start: trying and failing to pull image

spadgett commented 6 years ago

@stevekuznetsov How are the images built for these jobs?

spadgett commented 6 years ago

/assign @stevekuznetsov

bparees commented 6 years ago

this is consistently breaking our nightly test job.

/cc @wozniakjan

gabemontero commented 6 years ago

breaking the jenkins plugin PR test jobs as well

bparees commented 6 years ago

@spadgett these jobs should be extending the same base job configuration as our other conformance jobs that run extended tests.

bparees commented 6 years ago

(they extend parent: 'common/test_cases/origin_installed_release.yml')

bparees commented 6 years ago

/cc @jwforres @jupierce

stevekuznetsov commented 6 years ago

We started pulling the console image by SHA, not sure where or how... but the build AMI that the jobs were based off of was not building the full ecosystem. I updated that AMI job to build everything in https://github.com/openshift/aos-cd-jobs/pull/1280 and kicked off a build here: https://ci.openshift.redhat.com/jenkins/job/ami_build_origin_int_rhel_build/2525/

The jobs should be functional after that AMI is ready.

bparees commented 6 years ago

thanks @stevekuznetsov !

bparees commented 6 years ago

our jobs are passing again. @gabemontero if you're still having issues you think are related to this, please reopen it, but i think the main issue is resolved.

thanks again @stevekuznetsov

gabemontero commented 6 years ago

Yeah several plugins have had tests pass now ... so we are good with this one.

I was seeing one oddity with the sync plugin, but it seems unrelated, and I'm not sure yet if it is a flake or persistent.

On Tue, Apr 3, 2018 at 1:59 PM, Ben Parees notifications@github.com wrote:

Closed #19143 https://github.com/openshift/origin/issues/19143.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openshift/origin/issues/19143#event-1554736792, or mute the thread https://github.com/notifications/unsubscribe-auth/ADbadBIj0Gr9IfKZxweZ_2yiLu10_jpQks5tk7hvgaJpZM4TASJs .

wozniakjan commented 6 years ago

Happened again on Saturday, April 7th.

https://ci.openshift.redhat.com/jenkins/job/test_branch_origin_extended_builds/427/consoleFull#3923830558b6e51eb7608a5981914356

wozniakjan commented 6 years ago

Failed again https://ci.openshift.redhat.com/jenkins/job/test_branch_origin_extended_image_ecosystem/446/consoleFull#28517456858b6e51eb7608a5981914356

TASK [openshift_web_console : Verify that the web console is running] **********
task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/start.yml:2
FAILED - RETRYING: Verify that the web console is running (60 retries left).
...
FAILED - RETRYING: Verify that the web console is running (1 retries left).
fatal: [localhost]: FAILED! => {
...
curl: (6) Could not resolve host: webconsole.openshift-web-console.svc; Name or service not
...
TASK [openshift_web_console : Report console errors] ***************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_web_console/tasks/start.yml:51
fatal: [localhost]: FAILED! => {
    "changed": false, 
    "generated_timestamp": "2018-04-13 11:19:11.922200", 
    "msg": "Console install failed."
}

/cc @stevekuznetsov

spadgett commented 6 years ago

This is a different issue. The deployment is successful and the images can be pulled, but for whatever reason, we aren't able to curl the service (even though it exists).

Could not resolve host: webconsole.openshift-web-console.svc; Name or service not known

spadgett commented 6 years ago

@knobunc Any idea why curling the service from the master would fail when there are pods running and ready?

spadgett commented 6 years ago

Best I can tell, this is a networking issue accessing the service. @knobunc who can help me debug?

/assign @knobunc

tibers commented 6 years ago

Having the same problem and google brought me here. Using the openshift ansible installer, everything works until the webconsole part which keels over:

FAILED - RETRYING: Verify that the web console is running (1 retries left).
fatal: [myhost]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["curl", "-k", "https://webconsole.openshift-web-console.svc/healthz"], "delta": "0:00:05.518865", "end": "2018-04-23 16:53:53.693624", "msg": "non-zero return code", "rc": 6, "start": "2018-04-23 16:53:48.174759", "stderr": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0\r  0     0    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0curl: (6) Could not resolve host: webconsole.openshift-web-console.svc; Name or service not known", "stderr_lines": ["  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current", "                                 Dload  Upload   Total   Spent    Left  Speed", "", "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0", "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0", "  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0", "  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0", "  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0", "  0     0    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0curl: (6) Could not resolve host: webconsole.openshift-web-console.svc; Name or service not known"], "stdout": "", "stdout_lines": []}

When I visit my openshift webconsole, I get the SSL connection but the webconsole itself never loads.

otmanel31 commented 6 years ago

I have same error as @wozniakjan when installing OC using openshift-ansible installer and more precisely 3.9 release:

TASK [openshift_web_console : Verify that the web console is running] ******************************************************************************************************
Wednesday 25 April 2018  16:15:07 +0200 (0:00:00.041)       0:40:48.394 *******
FAILED - RETRYING: Verify that the web console is running (60 retries left).
.......
FAILED - RETRYING: Verify that the web console is running (1 retries left).
fatal: [ xx.xxxxx.xxxx.com]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["curl", "-k", "https://webconsole.openshift-web-console.svc/healthz"], "delta": "0:00:00.066483", "end": "2018-04-25 16:25:36.610975", "msg": "non-zero return code", "rc": 6, "start": "2018-04-25 16:25:36.544492", "stderr": "  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: webconsole.openshift-web-console.svc; Nom ou service inconnu", "stderr_lines": ["  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current", "                                 Dload  Upload   Total   Spent    Left  Speed", "", "  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: webconsole.openshift-web-console.svc; Nom ou service inconnu"], "stdout": "", "stdout_lines": []}
...ignoring
.......
........
TASK [openshift_web_console : Report console errors] ***********************************************************************************************************************
Wednesday 25 April 2018  16:25:55 +0200 (0:00:00.325)       0:51:36.899 *******
          FAILED! => {"changed": false, "msg": "Console install failed."}

And on 3.8 release, i had approximately same error on the same task: 0curl: (7) Failed connect to webconsole.openshift-web-console.svc:443; Connexion refusée"] => but here the connexion is refused

And then:

TASK [openshift_web_console : Report console errors] ***********************************************************************************************************************
Wednesday 25 April 2018  19:26:19 +0200 (0:00:00.283)       0:48:06.670 *******
fatal: [xxx.xxxxxxx.com]: FAILED! => {"changed": false, "msg": "Console install failed."}
otmanel31 commented 6 years ago
Full logs: ``` FAILED RETRYING: Verify that the web console is running (1 retries left). fatal: [xxxxxxcom]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["curl", "-k", "https://webconsole.openshift-web-console.svc/healthz"], "delta": "0:00:00.065267", "end": "2018-04-25 19:26:16.214900", "msg": "non-zero return code", "rc": 6, "start": "2018-04-25 19:26:16.149633", "stderr": " % Total % Received % Xferd Average Speed Time Time Time Current\n Dload Upload Total Spent Left Speed\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: webconsole.openshift-web-console.svc; Nom ou service inconnu", "stderr_lines": [" % Total % Received % Xferd Average Speed Time Time Time Current", " Dload Upload Total Spent Left Speed", "", " 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0curl: (6) Could not resolve host: webconsole.openshift-web-console.svc; Nom ou service inconnu"], "stdout": "", "stdout_lines": []} ...ignoring TASK [openshift_web_console : Check status in the openshift-web-console namespace] ***************************************************************************************** Wednesday 25 April 2018 19:26:16 +0200 (0:10:17.152) 0:48:03.840 ******* changed: [xxxxxxxx.com] => {"changed": true, "cmd": ["/usr/local/bin/oc", "status", "--config=/tmp/console-ansible-viOF80/admin.kubeconfig", "-n", "openshift-web-console"], "delta": "0:00:00.163337", "end": "2018-04-25 19:26:16.646537", "rc": 0, "start": "2018-04-25 19:26:16.483200", "stderr": "", "stderr_lines": [], "stdout": "In project openshift-web-console on server https://xxxxxxxxxx.com:8443\n\nsvc/webconsole - 172.30.111.193:443 -> 8443\n deployment/webconsole deploys openshift/origin-web-console:v3.9.0\n deployment #1 running for 10 minutes - 0/1 pods\n\nView details with 'oc describe /' or list everything with 'oc get all'.", "stdout_lines": ["In project openshift-web-console on server https://xxxxxxx.com:8443", "", "svc/webconsole - 172.30.111.193:443 -> 8443", " deployment/webconsole deploys openshift/origin-web-console:v3.9.0", " deployment #1 running for 10 minutes - 0/1 pods", "", "View details with 'oc describe /' or list everything with 'oc get all'."]} TASK [openshift_web_console : debug] *************************************************************************************************************************************** Wednesday 25 April 2018 19:26:16 +0200 (0:00:00.432) 0:48:04.273 ******* ok: [xxxxxxxxx..com] => { "msg": [ "In project openshift-web-console on server https://xxxxxxxxx.com:8443", "", "svc/webconsole - 172.30.111.193:443 -> 8443", " deployment/webconsole deploys openshift/origin-web-console:v3.9.0", " deployment #1 running for 10 minutes - 0/1 pods", "", "View details with 'oc describe /' or list everything with 'oc get all'." ] } TASK [openshift_web_console : Get pods in the openshift-web-console namespace] ********************************************************************************************* Wednesday 25 April 2018 19:26:16 +0200 (0:00:00.169) 0:48:04.443 ******* changed: [xxxxxxxx.com] => {"changed": true, "cmd": ["/usr/local/bin/oc", "get", "pods", "--config=/tmp/console-ansible-viOF80/admin.kubeconfig", "-n", "openshift-web-console", "-o", "wide"], "delta": "0:00:00.206943", "end": "2018-04-25 19:26:17.291075", "rc": 0, "start": "2018-04-25 19:26:17.084132", "stderr": "", "stderr_lines": [], "stdout": "NAME READY STATUS RESTARTS AGE IP NODE\nwebconsole-7c8555f55d-vjq86 0/1 Pending 0 10m ", "stdout_lines": ["NAME READY STATUS RESTARTS AGE IP NODE", "webconsole-7c8555f55d-vjq86 0/1 Pending 0 10m "]} TASK [openshift_web_console : debug] *************************************************************************************************************************************** Wednesday 25 April 2018 19:26:17 +0200 (0:00:00.472) 0:48:04.915 ******* ok: [xxxxxxxxxx.com] => { "msg": [ "NAME READY STATUS RESTARTS AGE IP NODE", "webconsole-7c8555f55d-vjq86 0/1 Pending 0 10m " ] } TASK [openshift_web_console : Get events in the openshift-web-console namespace] ******************************************************************************************* Wednesday 25 April 2018 19:26:17 +0200 (0:00:00.078) 0:48:04.994 ******* changed: [xxxxxxxxx.com] => {"changed": true, "cmd": ["/usr/local/bin/oc", "get", "events", "--config=/tmp/console-ansible-viOF80/admin.kubeconfig", "-n", "openshift-web-console"], "delta": "0:00:00.302392", "end": "2018-04-25 19:26:17.939789", "rc": 0, "start": "2018-04-25 19:26:17.637397", "stderr": "", "stderr_lines": [], "stdout": "LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE\n4m 10m 26 webconsole-7c8555f55d-vjq86.1528bd40e09c301e Pod Warning FailedScheduling default-scheduler 0/1 nodes are available: 1 NodeNotReady, 1 NodeOutOfDisk.\n10m 10m 1 webconsole-7c8555f55d.1528bd40e09ac300 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: webconsole-7c8555f55d-vjq86\n10m 10m 1 webconsole.1528bd4096f80a78 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set webconsole-7c8555f55d to 1", "stdout_lines": ["LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE", "4m 10m 26 webconsole-7c8555f55d-vjq86.1528bd40e09c301e Pod Warning FailedScheduling default-scheduler 0/1 nodes are available: 1 NodeNotReady, 1 NodeOutOfDisk.", "10m 10m 1 webconsole-7c8555f55d.1528bd40e09ac300 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: webconsole-7c8555f55d-vjq86", "10m 10m 1 webconsole.1528bd4096f80a78 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set webconsole-7c8555f55d to 1"]} TASK [openshift_web_console : debug] *************************************************************************************************************************************** Wednesday 25 April 2018 19:26:17 +0200 (0:00:00.569) 0:48:05.563 ******* ok: [xxxxxxxxxx.com] => { "msg": [ "LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE", "4m 10m 26 webconsole-7c8555f55d-vjq86.1528bd40e09c301e Pod Warning FailedScheduling default-scheduler 0/1 nodes are available: 1 NodeNotReady, 1 NodeOutOfDisk.", "10m 10m 1 webconsole-7c8555f55d.1528bd40e09ac300 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: webconsole-7c8555f55d-vjq86", "10m 10m 1 webconsole.1528bd4096f80a78 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set webconsole-7c8555f55d to 1" ] } TASK [openshift_web_console : Get console pod logs] ************************************************************************************************************************ Wednesday 25 April 2018 19:26:18 +0200 (0:00:00.172) 0:48:05.736 ******* changed: [xxxxxxxxxx.com] => {"changed": true, "cmd": ["/usr/local/bin/oc", "logs", "deployment/webconsole", "--tail=50", "--config=/tmp/console-ansible-viOF80/admin.kubeconfig", "-n", "openshift-web-console"], "delta": "0:00:00.190917", "end": "2018-04-25 19:26:18.665667", "rc": 0, "start": "2018-04-25 19:26:18.474750", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} TASK [openshift_web_console : debug] *************************************************************************************************************************************** Wednesday 25 April 2018 19:26:18 +0200 (0:00:00.555) 0:48:06.292 ******* ok: [xxxxxxx.com] => { "msg": [] } TASK [openshift_web_console : Remove temp directory] *********************************************************************************************************************** Wednesday 25 April 2018 19:26:18 +0200 (0:00:00.094) 0:48:06.386 ******* ok: [xxxxxxxxx.com] => {"changed": false, "path": "/tmp/console-ansible-viOF80", "state": "absent"} TASK [openshift_web_console : Report console errors] *********************************************************************************************************************** Wednesday 25 April 2018 19:26:19 +0200 (0:00:00.283) 0:48:06.670 ******* fatal: [xxxxxxxxxx]: FAILED! => {"changed": false, "msg": "Console install failed."} PLAY RECAP ***************************************************************************************************************************************************************** localhost : ok=12 changed=0 unreachable=0 failed=0 xxxxxxx.com : ok=456 changed=80 unreachable=0 failed=1 INSTALLER STATUS *********************************************************************************************************************************************************** Initialization : Complete (0:00:10) Health Check : Complete (0:00:01) etcd Install : Complete (0:00:52) Master Install : Complete (0:02:54) Master Additional Install : Complete (0:00:22) Node Install : Complete (0:00:55) Hosted Install : Complete (0:32:26) Web Console Install : In Progress (0:10:27) This phase can be restarted by running: playbooks/openshift-web-console/config.yml Wednesday 25 April 2018 19:26:19 +0200 (0:00:00.138) 0:48:06.808 ******* =============================================================================== openshift_hosted : Poll for OpenShift pod deployment success ------------------------------------------------------------------------------------------------------ 640.83s openshift_hosted : Poll for OpenShift pod deployment success ------------------------------------------------------------------------------------------------------ 637.04s openshift_hosted : Ensure OpenShift pod correctly rolls out (best-effort today) ----------------------------------------------------------------------------------- 625.76s openshift_web_console : Verify that the web console is running ---------------------------------------------------------------------------------------------------- 617.15s openshift_master : Pre-pull master system container image ---------------------------------------------------------------------------------------------------------- 63.36s etcd : Install or Update Etcd system container package ------------------------------------------------------------------------------------------------------------- 17.27s openshift_builddefaults : Set builddefaults ------------------------------------------------------------------------------------------------------------------------- 9.98s openshift_facts ----------------------------------------------------------------------------------------------------------------------------------------------------- 9.85s openshift_master : Verify API Server -------------------------------------------------------------------------------------------------------------------------------- 8.04s Verify API Server --------------------------------------------------------------------------------------------------------------------------------------------------- 7.97s openshift_hosted : Create OpenShift router -------------------------------------------------------------------------------------------------------------------------- 7.47s openshift_node : Pre-pull node system container image --------------------------------------------------------------------------------------------------------------- 6.90s openshift_hosted : Ensure OpenShift pod correctly rolls out (best-effort today) ------------------------------------------------------------------------------------- 6.63s etcd : Pull etcd system container ----------------------------------------------------------------------------------------------------------------------------------- 6.20s openshift_node : Pre-pull OpenVSwitch system container image -------------------------------------------------------------------------------------------------------- 5.22s openshift_cli : Copy client binaries/symlinks out of CLI image for use on the host ---------------------------------------------------------------------------------- 4.85s openshift_manageiq : Configure role/user permissions ---------------------------------------------------------------------------------------------------------------- 4.08s openshift_cli : Pull CLI Image -------------------------------------------------------------------------------------------------------------------------------------- 3.68s openshift_buildoverrides : Set buildoverrides config structure ------------------------------------------------------------------------------------------------------ 3.26s openshift_hosted : Create default projects -------------------------------------------------------------------------------------------------------------------------- 2.92s ```
tibers commented 6 years ago

OK I've tried this 10 different ways and I can consistently duplicate the problem. Fire up that terraform and do this:

main.tf:

provider "aws" {
  version    = "~> 1.11.0"
  region     = "us-east-1"
}

variable "terraform_source_url" {
  default = "https://my.private.repo.at.work"
}

terraform {
  backend "s3" {
     bucket     = "terraform-state-storage"
     encrypt    = true
     key        = "openshift-dev/terraform.tfstate"
     region     = "us-east-1"
  }
}

resource "aws_s3_bucket_object" "terraform_source_url" {
  bucket                 = "terraform-state-storage"
  content                = "${var.terraform_source_url}"
  key                    = "openshift-dev/terraform_source_url.txt"
  server_side_encryption = "AES256"
  tags {
    terraform_source_url  = "${var.terraform_source_url}"
  }
}

openshift.tf - make the appropriate substitutions for your VPC:

module "openshift" {
  source                    = "git::https://github.com/tibers/terraform-aws-openshift.git"
  public_subnet_ids         = ["subnet-something"]
  private_subnet_ids        = ["subnet-somethingelse"]
  vpc_id                    = "vpc-something"
  admin_ssh_key             = "your id_rsa goes here"
  management_net            = "your.cidr.goes.here/24"
  public_domain             = "your.domain.goes.here"

// in house CentOS 7 base image
  app_ami                 = "ami-something"
  infra_ami               = "ami-something"
  master_ami              = "ami-something"
  provisioner_ami         = "ami-something"

// Kosher CentOS 6 - https://wiki.centos.org/Cloud/AWS
//  app_ami                   = "ami-e3fdd999"
//  infra_ami                 = "ami-e3fdd999"
//  master_ami                = "ami-e3fdd999"
//  provisioner_ami           = "ami-e3fdd999"

  // instance types
  provisioner_instance_type = "m5.xlarge"
  master_instance_type      = "m5.2xlarge"
  infra_instance_type       = "m5.xlarge"
  app_instance_type         = "m5.xlarge"

  // sizing
  app_node_count            = "1"
  infra_node_count          = "1"
  master_node_count         = "1"

  // names
  // names must begin with openshift* so the filter works
  provisioner_name          = "openshift_provisioner"
  master_name               = "openshift_master"
  infra_name                = "openshift_infra"
  app_name                  = "openshift_app"

  // spot price
  provisioner_spot_price    = "1.00"
  master_spot_price         = "1.00"
  infra_spot_price          = "1.00"
  app_spot_price            = "1.00"
}

Shell into the provisioner box and run ansible-playbook -i /var/provisioner /openshift-ansible/playbooks/deploy_cluster.yml or check /var/provisioner/provisioner.log.

Unless I am doing something seriously goofy, this all worked in 3.6, but is broken in 3.9.

spadgett commented 6 years ago

@otmanel31 It looks like your error is happening because the node is out of disk space:

0/1 nodes are available: 1 NodeNotReady, 1 NodeOutOfDisk.

otmanel31 commented 6 years ago

Ok thank you @spadgett ... i try it on virtual machine with memory requirements and it works. But this error persist on test server (physical machine) ... :)

spadgett commented 6 years ago

Changing back to P1 for now since the original problem breaking the nightly tests was fixed.

junsaw commented 6 years ago

Hey All,

Are there any updates on this or any possible or potential work around ? Do know what is the root cause ? Any help will be appreciated.

spadgett commented 6 years ago

@junsaw Are you seeing the exact error as above? (Could not resolve host: webconsole.openshift-web-console.svc; Name or service not known)

You could set openshift_web_console_install=false if it's blocking you, which skips console install. The web console playbook can be run at a later time.

ping @knobunc for help debugging this

spadgett commented 6 years ago

I think the fix is just to check pod readiness instead of trying to curl the service from the master. The readiness probe will already check the health endpoint, so I don't see a real benefit to using curl.

spadgett commented 6 years ago

https://github.com/openshift/openshift-ansible/pull/8274 changes how we verify the console is installed, which will workaround problems resolving the service hostname

junsaw commented 6 years ago

@spadgett that allowed me to by pass the web_console installer.
Thanks Once Again.

spadgett commented 6 years ago

@junsaw Thanks for verifying the fix. Can you confirm that the web console is actually working for your cluster?

mbach04 commented 6 years ago

Is this going to receive a change to the source to move away from the curl check? I'm seeing this problem frequently.

spadgett commented 6 years ago

Yeah, we've switched to checking pod readiness instead in master. We're seeing this enough I think we should backport the change to 3.9.

@sdodson sound OK?

sdodson commented 6 years ago

Yeah that sounds good to me.

spadgett commented 6 years ago

https://github.com/openshift/openshift-ansible/pull/8608