Investigate the behaviour of Openshift features on CI

hferentschik commented 8 years ago

It seems it does not fail in cases where the tests are wrong, eg _When I evaluate and run bundle exec vagrant service-manager install-cli openshift --cli-version 1.3.0 --path #{ENV['VAGRANT_HOME']}/oc_ should fail, but did not.

We can temporarily change the CI job to build one of our forks on which we can introduce some obvious test errors. We need to verify that this will result in test failures.

coolbrg commented 8 years ago

@hferentschik I have updated the test to successfully run

bundle exec vagrant service-manager install-cli openshift --cli-version 1.3.0 --path #{ENV['VAGRANT_HOME']}/oc

as I found it is related to minimum memory requirement for Openshift #420 .

However, there is one test case which still need to be investigated.

When I run `bundle exec vagrant service-manager install-cli openshift`

We can keep this issue open till we found the reason of the above test case failure. Rest is fine and genuinely working in CI as part of my investigation.

coolbrg commented 8 years ago

We also found that there is some wait time (~20s in dev machine) required in between vagrant up and first operation related to openshift like vagrant service-manager status openshift.

We need to investigate the root cause for this delay requirement.

hferentschik commented 8 years ago

Even the CDK OpenShift test fails after upgrading to the latest CDK box.

hferentschik commented 8 years ago

For the record, the initial problem was that the tests were not configured to run against CDK. Hence, it looked like they were running and passing, but in reality they got skipped. This was hidden by issue #419. We were using the default pretty formatter without coloring which did not indicate which tests got run and which test got skipped.

hferentschik commented 8 years ago

It seems there is a regression in the OpenShift service startup which leads to the fact that the OpenShift status is not directly "running" after a successful 'vagrant up'.

hferentschik commented 8 years ago

So using CDK 2.2, the Vagrant configuration from adb-atomic-developer-bundle and running:

vagrant up; vagrant service-manager status

I get:

==> default: Docker service configured successfully...
==> default: OpenShift service configured successfully...
==> default: Mounting SSHFS shared folder...
==> default: Mounting folder via SSHFS: /Users/hardy => /Users/hardy
==> default: Checking Mount..
==> default: Folder Successfully Mounted!
==> default: Running provisioner: shell...
    default: Running: inline script
==> default: Running provisioner: shell...
    default: Running: inline script
==> default:
==> default: Successfully started and provisioned VM with 2 cores and 3072 MB of memory.
==> default: To modify the number of cores and/or available memory set the environment variables
==> default: VM_CPU and/or VM_MEMORY respectively.
==> default:
==> default: You can now access the OpenShift console on: https://10.1.2.2:8443/console
==> default: To use OpenShift CLI, run:
==> default: $ vagrant ssh
==> default: $ oc login
==> default:
==> default: Configured users are (<username>/<password>):
==> default: openshift-dev/devel
==> default: admin/admin
==> default:
==> default: If you have the oc client library on your host, you can also login from your host.
Configured services:
docker - running
openshift - stopped
kubernetes - stopped

OpenShift service is reported to be running. Then running a few seconds later:

$ vagrant service-manager status
Configured services:
docker - running
openshift - running
kubernetes - stopped

So the OpenShift provisioning seems to return too early. This is consistent with the behavior we see in the tests.

hferentschik commented 8 years ago

It does not matter whether I use:

config.vm.provision "shell", run: "always", inline: <<-SHELL
  PROXY=#{PROXY} PROXY_USER=#{PROXY_USER} PROXY_PASSWORD=#{PROXY_PASSWORD} /usr/bin/sccli openshift
SHELL

or just

config.servicemanager.services = "openshift"

LalatenduMohanty commented 8 years ago

So the OpenShift provisioning seems to return too early. This is consistent with the behavior we see in the tests.

And this is only seen with CDK 2.2 but not CDK 2.1.

hferentschik commented 8 years ago

AFAICT yes

coolbrg commented 8 years ago

CDK 2.1 behavior:

$ vagrant up; vagrant service-manager status
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'cdkv2.1'...
...
==> default: Copying TLS certificates to /home/budhram/redhat/vagrant-service-manager/.vagrant/machines/default/virtualbox/docker
==> default: Docker service configured successfully...
==> default: OpenShift service configured successfully...
Configured services:
docker - running
openshift - running
kubernetes - stopped

coolbrg commented 8 years ago

CDK 2.2 with systemctl start openshift provisioner:

$ vagrant up; vagrant service-manager status
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'cdkv2'...
.......
==> default: Copying TLS certificates to /home/budhram/redhat/vagrant-service-manager/.vagrant/machines/default/virtualbox/docker
==> default: Docker service configured successfully...
==> default: OpenShift service configured successfully...
==> default: Running provisioner: shell...
    default: Running: inline script
==> default: OpenShift started...
Configured services:
docker - running
openshift - running
kubernetes - stopped

## Vagrantfile
Vagrant.configure(2) do |config|
  config.vm.box = 'cdkv2'
  config.vm.network "private_network", ip: "10.10.10.42"
  config.registration.skip = true
  config.vm.provider('libvirt') { |v| v.memory = 3072 }
  config.vm.provider('virtualbox') { |v| v.memory = 3072 }
  config.vm.synced_folder '.', '/vagrant', disabled: true

  # explicitly enable and start OpenShift
  config.vm.provision "shell", run: "always", inline: <<-SHELL
    systemctl start openshift
    echo "OpenShift started..."
  SHELL
end

hferentschik commented 8 years ago

CDK 2.2 with systemctl start openshift provisioner:

@budhrg so you are saying that with calling systemctl directly it works? In this case we are dealing with a sccli bug, right?

hferentschik commented 8 years ago

@budhrg If it works with systemctl we can change the Vagrant config in the respective feature files to use systemctl. At least this is better than a "random" sleep. We can add a comment to the issue in developer-bundle and update the tests once we have a fix there. WDYT?

hferentschik commented 8 years ago

@budhrg nice digging ;-)

coolbrg commented 8 years ago

Blocked on https://github.com/projectatomic/adb-utils/issues/194

hferentschik commented 8 years ago

Blocked on projectatomic/adb-utils#194

We don't have to be blocked, right? See https://github.com/projectatomic/vagrant-service-manager/issues/415#issuecomment-256027909

coolbrg commented 8 years ago

@budhrg If it works with systemctl we can change the Vagrant config in the respective feature files to use systemctl. At least this is better than a "random" sleep. We can add a comment to the issue in developer-bundle and update the tests once we have a fix there. WDYT?

But don't you think it is like diverting from actual behavior? I feel like we are doing some hack on our tests to make it pass :smile:

WDYT? @LalatenduMohanty

hferentschik commented 8 years ago

But don't you think it is like diverting from actual behavior? I feel like we are doing some hack on our tests to make it pass

This is for sure better than a sleep. Also the tests are about service-manager not about the VM. For our purposes we need a properly provisioned OpenShift. If we can get this vis systemctl so be it. I also rather do this and have the CDK tests running opposed to skipping them completely atm.

coolbrg commented 8 years ago

@hferentschik Somehow now I am not able to get systemctl start openshift shell provisioner running now.

Even same reported by CI too https://ci.centos.org/job/vagrant-service-manager-budh/20/console Required changes I did are https://github.com/budhrg/vagrant-service-manager/commit/5e6a3721ed2ed2cb7c0f071511b0c022ba6497f1

Locally it is passing now sometimes.

Even tests are passing locally

➜  vagrant-service-manager-openshift-investigate git:(adb-openshift-investigate) ✗ be rake features FEATURE=features/cdk-openshift.feature PROVIDER=libvirt BOX=cdk
/Using existing public releaase CDK box (version v. 2.2.0 for x86_64) in /home/budhram/redhat/vagrant-service-manager-openshift-investigate/.boxes
|/home/budhram/.rvm/rubies/ruby-2.1.2/bin/ruby -S bundle exec cucumber features/cdk-openshift.feature
Using the default and html profiles...
.---------------------------------..................................

2 scenarios (1 skipped, 1 passed)
68 steps (33 skipped, 35 passed)
4m49.516s

Don't know whats happening in CI. :confused:

projectatomic / vagrant-service-manager

Investigate the behaviour of Openshift features on CI #415