Some delay inbetween openshift service reporting as running and vagrant up with CDK 2.2

coolbrg commented 7 years ago

With CDK 2.2:

So using CDK 2.2, the Vagrant configuration from adb-atomic-developer-bundle and running:

vagrant up; vagrant service-manager status

I get:

.............
==> default: Docker service configured successfully...
==> default: OpenShift service configured successfully...
......
==> default: Successfully started and provisioned VM with 2 cores and 3072 MB of memory.
.......
==> default: If you have the oc client library on your host, you can also login from your host.
Configured services:
docker - running
openshift - stopped
kubernetes - stopped

OpenShift service is reported to be stopped. Then running a few seconds later:

$ vagrant service-manager status
Configured services:
docker - running
openshift - running
kubernetes - stopped

So the OpenShift provisioning seems to return too early. This is consistent with the behavior we see in the tests.

With CDK 2.1 :

$ vagrant up; vagrant service-manager status
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'cdkv2.1'...
...
==> default: Copying TLS certificates to /home/budhram/redhat/vagrant-service-manager/.vagrant/machines/default/virtualbox/docker
==> default: Docker service configured successfully...
==> default: OpenShift service configured successfully...
Configured services:
docker - running
openshift - running
kubernetes - stopped

Also observed, CDK 2.2 with `systemctl start openshift` provisioner working fine:

$ vagrant up; vagrant service-manager status
Bringing machine 'default' up with 'virtualbox' provider...
==> default: Importing base box 'cdkv2'...
....
==> default: Copying TLS certificates to /home/budhram/redhat/vagrant-service-manager/.vagrant/machines/default/virtualbox/docker
==> default: Docker service configured successfully...
==> default: OpenShift service configured successfully...
==> default: Running provisioner: shell...
    default: Running: inline script
==> default: OpenShift started...
Configured services:
docker - running
openshift - running
kubernetes - stopped

## Vagrantfile
Vagrant.configure(2) do |config|
  config.vm.box = 'cdkv2'
  config.vm.network "private_network", ip: "10.10.10.42"
  config.registration.skip = true
  config.vm.provider('libvirt') { |v| v.memory = 3072 }
  config.vm.provider('virtualbox') { |v| v.memory = 3072 }
  config.vm.synced_folder '.', '/vagrant', disabled: true

  # explicitly enable and start OpenShift
  config.vm.provision "shell", run: "always", inline: <<-SHELL
    systemctl start openshift
    echo "OpenShift started..."
  SHELL
end

praveenkumar commented 7 years ago

I am able to reproduce this issue but still not sure why it happen. Just to check sccli script behavior I added a flag which return exit value and that seemed as expected. behind the scene sccli is running system command which use subprocess module with communicate and as per doc that suppose to wait till process is complete and return tupple https://docs.python.org/2/library/subprocess.html#subprocess.Popen.communicate .

==> default: Running provisioner: shell...
    default: Running: inline script
==> default: 0
==> default: Running provisioner: shell...
    default: Running: inline script

Here we can see return value is 0 from the inline script. Does this is a blocker?

coolbrg commented 7 years ago

@praveenkumar Why it is not the case with CDK 2.1 ? What are the changes in CDK 2.2 around it?

praveenkumar commented 7 years ago

What are the changes in CDK 2.2 around it?

Only change we have is using sccli instead of systemctl for openshift service in the Vagrantfile and we did this because we need to pass shell variable before start service now, like proxy stuff and OSE specific version.

praveenkumar commented 7 years ago

@budhrg BTW how do you folks do same testing with ADB because for that it always the sccli in the Vagrantfile?

coolbrg commented 7 years ago

@praveenkumar We now have added sleep of 10 seconds to pass our CI . See here https://github.com/projectatomic/vagrant-service-manager/blob/master/features/adb-openshift.feature#L36

Locally I need to give 20 secs sometime.

coolbrg commented 7 years ago

Only change we have is using sccli instead of systemctl for openshift service in the Vagrantfile and we did this because we need to pass shell variable before start service now, like proxy stuff and OSE specific version.

Then, it is sccli issue I guess that it is not behaving similar to systemctl as you mentioned in your comment above.

praveenkumar commented 7 years ago

Another finding during meeting is service goes to activating state for some time before it run.

[root@rhel-cdk vagrant]# sccli openshift start
[root@rhel-cdk vagrant]# systemctl is-active openshift
activating
[root@rhel-cdk vagrant]# echo $?
3 => activating state return code. 
[root@rhel-cdk vagrant]# systemctl status openshift
● openshift.service - Docker Application Container for OpenShift
   Loaded: loaded (/usr/lib/systemd/system/openshift.service; disabled; vendor preset: disabled)
   Active: activating (start-post) since Wed 2016-10-26 05:47:32 EDT; 2s ago
     Docs: https://docs.openshift.org/
  Process: 16966 ExecStop=/usr/bin/sh -c /opt/adb/openshift/openshift_stop (code=exited, status=0/SUCCESS)
  Process: 17099 ExecStartPre=/usr/bin/docker rm openshift (code=exited, status=0/SUCCESS)
  Process: 17092 ExecStartPre=/usr/bin/docker stop openshift (code=exited, status=0/SUCCESS)
 Main PID: 17106 (sh);         : 17107 (sh)
   Memory: 8.5M
   CGroup: /system.slice/openshift.service
           ├─17106 /usr/bin/sh /opt/adb/openshift/openshift
           ├─17143 /usr/bin/docker-current run --name openshift --privileged --net=host --pid=host -w /var/lib/openshift -e KUBECONFIG=/var/lib/openshift/openshift.local....
           └─control
             ├─17107 /usr/bin/sh /opt/adb/openshift/openshift_provision
             └─17182 sleep 1
[root@rhel-cdk vagrant]# systemctl is-active openshift
active

projectatomic / adb-utils