openshift-evangelists / oc-cluster-wrapper

oc cluster up bash wrapper
Apache License 2.0
144 stars 72 forks source link

can't start an existing profile when my host IP addr changes #57

Open rafaeltuelho opened 7 years ago

rafaeltuelho commented 7 years ago

When i get a different IP addr (my laptop host) and try to start an existing saved profile i get the following error:

./oc-cluster up origin-141-demo-metrics-fis2                                                                                       
Performing some customization for platform linux
Using Docker0 (172.17.42.1) ip as external cluster and router address
[INFO] Running a previously created cluster
oc cluster up --public-hostname 172.17.42.1.xip.io --routing-suffix apps.172.17.42.1.xip.io --host-data-dir /home/rsoares/.oc/profiles/origin-141-demo-metrics-fis2/data --host-config-dir /home/rsoares/.oc/profiles/origin-141-demo-metrics-fis2/config --use-existing-config -e TZ=BRT --metrics=true --routing-suffix=172.17.42.1.xip.io 
-- Checking OpenShift client ... OK
-- Checking Docker client ... OK
-- Checking Docker version ... OK
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.4.1 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... 
   WARNING: Binding DNS on port 8053 instead of 53, which may not be resolvable from all clients.
-- Checking type of volume mount ... 
   Using nsenter mounter for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ... 
   Using x.x.x.x as the server IP
-- Starting OpenShift container ... FAIL
   Error: Docker run error rc=2
   Details:
     Image: openshift/origin:v1.4.1
     Entrypoint: [/bin/bash]
     Command: [-c for name in x.x.x.x rsoares; do ls /var/lib/origin/openshift.local.config/node-$name &> /dev/null && echo $name && break; done]
[ERROR] Cluster has not started correctly. Profile configuration will be preserved

I tried to manually rename these directories, but no success

 ls -la /var/lib/origin/openshift.local.config/                                                                                                                                                                            1 ↵
total 4
drwxr-xr-x. 5 root root   65 fev 13 17:14 .
drwxr-xr-x. 4 root root   67 jan 19 10:07 ..
drwxr-xr-x. 2 root root 4096 fev  1 21:44 master
drwxr-xr-x. 2 root root  209 jan 19 10:07 node-x.x.x.x
drwxr-xr-x. 2 root root  209 fev  1 21:32 node-rsoares
for name in x.x.x.x rsoares; do ls /var/lib/origin/openshift.local.config/node-$name &> /dev/null && echo $name && break; done

x.x.x.x
jorgemoralespou commented 7 years ago

@rafaeltuelho It seems I have a bug, as I set the routing-suffix and it seems you've also passed it as argument so it get double set and probably that's an error in the underlying oc cluster.

cc/ @csrwng Can this be a problem? Why is he getting x.x.x.x as server IP?

jorgemoralespou commented 7 years ago

@rafaeltuelho what oc client version are you using?

rafaeltuelho commented 7 years ago

@jorgemoralespou , at this moment I'm using Origin 1.4.1:

 oc version
oc v1.4.1+3f9807a
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://172.17.42.1.xip.io:8443
openshift v1.4.1+3f9807a
kubernetes v1.4.0+776c994

The x.x.x.x above is just to mask my hots real ip addr :-)

jorgemoralespou commented 7 years ago

@rafaeltuelho there's an issue openned in oc cluster to try to hace a consistent node name between restarts when roaming and not get the node ip which can be different leading to these problems.

@csrwng do you know if this can be customized via an env or in any other way?

rafaeltuelho commented 7 years ago

hum, this is critical for oc-cluster-wrapper tool. The main advantage is the ability to reproduce a setup... But with this issue it's not possible to reuse a saved profile if you change your host IP addr.

jorgemoralespou commented 7 years ago

I'm traveling at the moment but will look into it once I'm back. I haven't experienced this before and there's no other report on this so I wonder if it's due to any other circumstance. Will definitely look more carefully soon. I don't change ip even when I roam since I have an alias on 10.2.2.2 to my loopback that is what I use. This is more consistent and less troublesome.

rafaeltuelho commented 7 years ago

@jorgemoralespou, how can I tell oc-cluster to use a specific alias ip or loopback to overcome this issue?

I though passing --public-hostname 172.17.42.1.xip.io and --routing-suffix=172.17.42.1.xip.io would be sufficient. isn't it?

jorgemoralespou commented 7 years ago

I just use --public-hostname=x.y.z and --routing-suffix=apps.lcup and I have a dnsmasq locally for .apps.lcup routed to x.y.z I don't like to rely on xip.io, which is handy but not for my day to day work. I guess I should blog how I do it as it will be handy for people. I'll try to do it soon.

In the meantime, you can still use xip.io for the --routing-prefix but not for the public-hostname use the loopback ip.

rafaeltuelho commented 7 years ago

@jorgemoralespou , Thanks for your help!

It only works if I use a loopback IP alias (eg: 10.2.2.2) like you suggested. I tried with --public-hostname=127.0.0.1 but the containers can't resolve external addresses (eg: github.com) as described in [1]. Even configuring my local Fedora firewalld to accept traffic on 8443, 53, 5053.

Anyway, it worked fine using an ip alias 10.2.2.2

UPDATE: I forgot to add a nameserver entry on my /etc/resolv.conf pointing to my local dnsmasq

Also, can you share your local dnsmasq conf? I tried the following config, but it can resolve my routes:

address=/apps.ocp.localdomain/10.2.2.2

I tried this also:

server=/apps.ocp.localdomain/10.2.2.2#8053

[1] https://github.com/openshift/origin/issues/10139

wulliam commented 7 years ago

I meet the same issue, I setup a server with ip 10.25.0.91, then need to change a server, I restore the os backup to another server with ip 10.27.232.232 I add a ip alias with "ifconfoig lo:0 10.25.0.91", but still can not start my cluster.

jorgemoralespou commented 7 years ago

@wulliam This is not supported by "oc cluster" which is the underlying technology used. I think 3.6 brings a fix for this. Could you try with that version (the latest I think is (v3.6.0-alpha.2) and go through that use case?

If not, please open a bug there. https://github.com/openshift/origin/issues

wulliam commented 7 years ago

@jorgemoralespou Thanks for your help! I finally start my cluster with all my previous work back. To help anyone who meet the same issue, I will describe what I do as below.

  1. Go to --host-config-dir "/root/.oc/profiles/landaojia/config" to replace the ip address in config files with bash grep -rl '10.25.0.91' * |xargs -I {} sed -i 's/10.25.0.91/10.27.232.232/g' {}, and rename folder "node-10.25.0.91" to "node-10.27.232.232"

  2. oc-cluster up sunwayxiyi create a new project, and use the new "key,crt,kubeconfig" files to overwrite landaojia (your) projects. example bash to overwrite openshift origin key crt kuberconfig files

  3. Go to edit "/root/.oc/profiles/landaojia/run", replace oc cluster up --version v1.5.0 to oc cluster up --version v3.6.0-alpha.2

Note: If you ignore step2 not replace new key crt kuberconfig files, openshift can not start with error 2017-06-15 01:41:32.403931 I | etcdserver/api/v3rpc: Failed to dial 10.27.232.232:4001: connection error: desc = "transport: x509: certificate is valid for 10.25.0.91, 114.55.172.6, 127.0.0.1, 172.30.0.1, 192.168.0.1, not 10.27.232.232"; please retry.

If in step3, use "v1.5.0-alpha.3" instead of "v3.6.0-alpha.2", openshift can start, but your open shift project is empty, and you will lost your openshift project.

If in step2, if you replace the ip address in etcd binary file path --host-data-dir "/root/.oc/profiles/landaojia/data", the file will be broken, openshift can not be started.