gshipley / installcentos

427 stars 456 forks source link

Provisioning fails on clean CentOS7 #109

Open judavi opened 5 years ago

judavi commented 5 years ago

Openshift Version: 3.11 Ansible Version: 2.6.5 When running installation is successful but the K8s nodes get pending

captura de pantalla 2018-10-23 a la s 3 12 04 p m

After some research looks like is something with the network plugin kubectl describe nodes:

captura de pantalla 2018-10-23 a la s 3 12 40 p m

But I'm not really sure what I need to change on the network to make it work. Thanks!

judavi commented 5 years ago

I'm wonder if have some kind of relation with this https://github.com/openshift/openshift-ansible/issues/7967

DeanKamali commented 5 years ago

I have the same issue, it went smoothly in the demo video, yet on vanilla CentoS 7 it hangs in Pending state

DeanKamali commented 5 years ago

@judavi How much memory / CPU dose your VM have?

judavi commented 5 years ago

4 cores, 16 GB of ram. I think that’s not the issue. As you see in the latest screenshot from the issue report the nodes are reporting plenty of resources to work.

marekjelen commented 5 years ago

I have tried t re-provision the cluster and everything worked smooth. We would need more information to be able to help :(

judavi commented 5 years ago

@marekjelen I have a clue, what happen if you run all this in a VM in Azure? Will still work?

marekjelen commented 5 years ago

Should. The CentOS on Azure was not provided by CentOS but some company, not sure what is the status today and if it’s clean or tweaked. I am validating on DigitalOcean, any CentOS machine with internet access should work, but in some environments it might be required to correctly set up the env variables, as the detection might no provide wanted configuration.

judavi commented 5 years ago

Cool, so I will upload a CentOs image and then try again (and I will share my findings)

judavi commented 5 years ago

@marekjelen you mean these ENV variables?

$ export DOMAIN=.nip.io $ export USERNAME= $ export PASSWORD=password

judavi commented 5 years ago

Also I'm wondering what should be the IP value in the case you're running the script in a cloud provider? the default value?

DeanKamali commented 5 years ago

@judavi I have downloaded a fresh copy of CentOS 7.5 off Centos.org on VM with 32GB RAM and it worked smoothly, I failed to get it working on DigitalOcean or Vultr on VMs with 8GB RAM

As for IP, I have created a DNS record to point to my VM's IP

I have modified install-openshift.sh and changed number of drives from 200 to 5, I have also modified vol.yamland changed drive size from 500G to 1G

I had issues with metrics and logging so I have turned them off,

export METRICS="False" LOGGING="False"

I think VM resources has a great impact on the outcome.

marekjelen commented 5 years ago

@judavi yes, those. if case you are running on a clod provide, the IP should be the one available from the outside world (public IP), which the script should detect correctly. It can be ephemeral or persistent, that does not matter.

judavi commented 5 years ago

@marekjelen I'm wondering if that could be part of the issue because the script is using the private ip instead the public ip for that value. Btw the domain is working correctly captura de pantalla 2018-10-25 a la s 6 45 37 a m

marekjelen commented 5 years ago

@DeanKamali in case you are running on DO, you can check the validation which has a Terraform automation for running on top of DO. That's what I use to test the script.

If you have your own DNS records, you need to set up the DOMAIN correctly.

The PVs are "virtual" they simply tell OpenShift it case use these 200 directories and that they can each have up to 500G, however your system does not have to provide these resources. We simply overcommit here, but your system can simply as small as the tinies DO VM.

Yes, METRICS and LOGGING use the resources.

marekjelen commented 5 years ago

@judavi yes, that might be a problem. Can you override the IP to your public IP?

judavi commented 5 years ago

Sure, testing right now using the public IP