rancher / quickstart

380 stars 335 forks source link

Vagrant Rancher stuck on curl #178

Closed rahendatri closed 2 years ago

rahendatri commented 3 years ago

Hello, I wanted to test Rancher on my local machine to begin discovering the solution. My env configurations are : Vagrant 2.2.18 Virtualbox Version 6.1.26 r145957 Cygwin on Windows 10

To start a Rancher dev environment, I just followed the instructions given here : https://rancher.com/docs/rancher/v2.5/en/quick-start-guide/deployment/quickstart-vagrant/

However, it seems like the starting process is stuck on these lines : https://github.com/rancher/quickstart/blob/master/vagrant/scripts/configure_rancher_server.sh#L20-L23 I got the following output continuously and I don't know how to solve this.

...
server-01: + docker run --rm --net=host appropriate/curl -sLk https://127.0.0.1/ping
server-01: + sleep 5
server-01: + true
...

Clearly, the Rancher server is somehow not accessible (the server is not responding neither on 80 nor on 443). When I logged into the server node and I tried to do docker ps, I got the following output :

[rancher@server-01 ~]$ docker ps -a
CONTAINER ID        IMAGE                    COMMAND             CREATED             STATUS              PORTS
                  NAMES
e0b770daa3c3        rancher/rancher:v2.5.7   "entrypoint.sh"     23 minutes ago      Up About a minute   0.0.0.0:80->80/tcp, 0.0.0
.0:443->443/tcp   affectionate_shtern

There is somehow a problem within the Rancher container I cannot figure it out.

Could somebody help me with this, please ?

I am a newby in Rancher world and would really appreciate some help.

Thanks in advance ! Rahenda

superseb commented 3 years ago

Running the command without silent -s at least shows why it cannot connect (docker run --rm --net=host appropriate/curl -Lk https://127.0.0.1/ping), but to find out why it cannot connect, you should check or supply the logs from the created container (docker logs e0b770daa3c3 2>&1 or docker logs affectionate_shtern 2>&1)

rahendatri commented 3 years ago

Hello, Thank you for your response. I already solved the problem. I just updated the config.yaml file to allocate more cpu and memory (2 instead of 1 CPU and 2048 instead of 1500 Mb). This leads to smoother experience. Last time I tried with the original config, the curl loop took about 30 mins and I thought that there were something wrong with the initialisation so I kept deleting and recreating the env until I figured out that it was more of resources problem rather than networking or else.

I would suggest a PR to allocate higher CPU and memory for smoother experience.

stefanotorresi commented 2 years ago

I'm also experencing this issue.

The node installation script infinitely loops with the following output:

    node-01: + true
    node-01: ++ docker run --rm appropriate/curl -sLk -H 'Authorization: Bearer ' 'https://172.22.101.101/v3/clusters?name=quickstart'
    node-01: ++ docker run --rm -i stedolan/jq -r '.data[].id'
    node-01: jq: error (at <stdin>:1): Cannot iterate over null (null)
    node-01: + CLUSTERID=
    node-01: + '[' -n '' ']'
    node-01: + sleep 5

Increasing node resources didn't help.

nfaction commented 2 years ago

I am having the same issues here... I increased both boxes to 2 cores + 4GB RAM each. After increasing ram+cores, I was able to finally login to the Dashboard on https://192.168.56.101/ however, after drilling down to the setup, specifically the local cluster creation, I see a never ending list of logs showing that the Pod helm-operation-XXXX fails. Here's what I'm seeing in this pod: FailedCreatePodSandBox:

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "rancher/pause:3.1": failed to pull image "rancher/pause:3.1": failed to pull and unpack image "docker.io/rancher/pause:3.1": failed to resolve reference "docker.io/rancher/pause:3.1": failed to do request: Head "https://registry-1.docker.io/v2/rancher/pause/manifests/3.1": dial tcp: lookup registry-1.docker.io: no such host

In the vagrant up shell, I get an infinite loop of this:

    node-01: + true
    node-01: ++ docker run --rm appropriate/curl -sLk -H 'Authorization: Bearer ' 'https://192.168.56.101/v3/clusters?name=quickstart'
    node-01: ++ docker run --rm -i stedolan/jq -r '.data[].id'
    node-01: jq: error (at <stdin>:1): Cannot iterate over null (null)
    node-01: + CLUSTERID=
    node-01: + '[' -n '' ']'
    node-01: + sleep 5

Here's the vagrant status:

Config: {"admin_password"=>"admin", "rancher_version"=>"v2.6.0", "ROS_version"=>"1.5.1", "k8s_version"=>"v1.20.10-rancher1-1", "server"=>{"cpus"=>4, "memory"=>4096}, "node"=>{"count"=>1,
"cpus"=>4, "memory"=>4096}, "ip"=>{"master"=>"192.168.56.100", "server"=>"192.168.56.101", "node"=>"192.168.56.111"}, "linked_clones"=>true, "net"=>{"private_nic_type"=>"82545EM", "networ
k_type"=>"private_network"}}

Current machine states:

server-01                 running (virtualbox)
node-01                   running (virtualbox)

Here's the current box:

$ vagrant box list
chrisurwin/RancherOS (virtualbox, 1.5.1)

I have torn down, deleted the box manually and did a fresh clone to ensure there were no environmental issues besides that.

For now, I'm giving up on this since it's completely non-function and not a "quickstart" as listed. Btw, https://rancher.com/docs/rancher/v2.5/en/quick-start-guide/deployment/quickstart-vagrant/ contains the wrong documentation as it points to the old cluster information on the 172 address not the 192 address block.

bashofmann commented 2 years ago

The Vagrant config is now updated to request more resources. This should make the creation of the clusters and installation more stable. Of course, it still depends on how much resources you have available on the system and what else is running in parallel and may be slowing things down. The documentation is also updated to point to the new IPs.

Please reopen or create a new issue if the problem persists.

Oussama-Goumghar commented 2 years ago

Hello, Thank you for your response. I already solved the problem. I just updated the config.yaml file to allocate more cpu and memory (2 instead of 1 CPU and 2048 instead of 1500 Mb). This leads to smoother experience. Last time I tried with the original config, the curl loop took about 30 mins and I thought that there were something wrong with the initialisation so I kept deleting and recreating the env until I figured out that it was more of resources problem rather than networking or else.

I would suggest a PR to allocate higher CPU and memory for smoother experience.

this solution solved the problem (for me)