scaleway / docker-machine-driver-scaleway

:whale: Scaleway driver for Docker Machine
MIT License
249 stars 34 forks source link

docker-machine hangs on server creation #81

Closed optimum-web closed 6 years ago

optimum-web commented 6 years ago

I was able to create instance at scaleway with docker-machine, and i see it in "Running" state in dashboard however "docker-machine ls" shows me

NAME                   ACTIVE   DRIVER           STATE     URL   SWARM   DOCKER    ERRORS
sample-scaleway4.com            scaleway(VC1S)   Timeout

however docker-machine status shows "Running" state Any hints ?

remyleone commented 6 years ago

Could you provide me information about your scaleway instance through a ticket? https://cloud.scaleway.com/#/tickets Looks like a problem with your instance but not really with docker-machine-driver-scaleway

optimum-web commented 6 years ago

https://cloud.scaleway.com/#/tickets/ZAIL-0813-NFBY

remyleone commented 6 years ago

Normally the issue should be fixed: https://status.online.net/index.php?do=details&task_id=1229 Could you try again?

optimum-web commented 6 years ago

sorry, it's still hangs.

optimum-web commented 6 years ago

web console in dashboard shows empty screen , but machine was created.

optimum-web commented 6 years ago

Running pre-create checks... Creating machine... (scalewaytest10.com) Creating SSH key... (scalewaytest10.com) Creating server... (scalewaytest10.com) Starting server... Waiting for machine to be running, this may take a few minutes... Detecting operating system of created instance... Waiting for SSH to be available... ..... Hangs for more than 20 minutes already ...

remyleone commented 6 years ago

Have you tried to connect manually with the SSH command presented in the web console? Is your machine running normally? I'm trying to bisect whether the problem comes from the Scaleway servers not provisioning your machine or your docker-machine not connecting successfully to the instance created.

optimum-web commented 6 years ago

Yes I tried but get timeout, in web dashboard I see it in "Running" state. Web console shows nothing.

https://imgur.com/a/a62KBZr

https://imgur.com/VaG59lu

optimum-web commented 6 years ago

docker-machine ls scalewaytest9.com - scaleway(VC1S) Error Unknown "a1540e49-bb94-4ad4-81d5-62c2270d89f2" not found scalewaytest10.com scaleway(VC1S) Timeout

remyleone commented 6 years ago

Could you try to connect with the following command: ssh root@51.15.250.55 If this is not working I would suggest opening a ticket.

Le sam. 21 avr. 2018 à 15:52, Optimum notifications@github.com a écrit :

docker-machine ls scalewaytest9.com - scaleway(VC1S) Error Unknown "a1540e49-bb94-4ad4-81d5-62c2270d89f2" not found scalewaytest10.com scaleway(VC1S) Timeout

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/scaleway/docker-machine-driver-scaleway/issues/81#issuecomment-383297145, or mute the thread https://github.com/notifications/unsubscribe-auth/AASXImlqlp5D3mp4xl_tLdPML8JJh17Eks5tqzm0gaJpZM4TZ1jo .

optimum-web commented 6 years ago

Does not work. ssh root@51.15.250.55 ssh: connect to host 51.15.250.55 port 22: Connection refused

docker-machine ssh scalewaytest9.com "a1540e49-bb94-4ad4-81d5-62c2270d89f2" not found

root@fb221082681f:/app# docker-machine ssh scalewaytest10.com exit status 255

remyleone commented 6 years ago

I would suggest opening a ticket. Feel free to contact me offline rleone@online.net

Le sam. 21 avr. 2018 à 15:59, Optimum notifications@github.com a écrit :

Does not work. ssh root@51.15.250.55 ssh: connect to host 51.15.250.55 port 22: Connection refused

docker-machine ssh scalewaytest9.com "a1540e49-bb94-4ad4-81d5-62c2270d89f2" not found

root@fb221082681f:/app# docker-machine ssh scalewaytest10.com exit status 255

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/scaleway/docker-machine-driver-scaleway/issues/81#issuecomment-383297559, or mute the thread https://github.com/notifications/unsubscribe-auth/AASXIpOKCTcjYefs7GCppouPpMfl6Unlks5tqztCgaJpZM4TZ1jo .

optimum-web commented 6 years ago

Can you reopen current issue ? I have not enough privileges to do it.

antapos commented 6 years ago

I have the same problem. I opened a support ticket at Scaleway and got the answer shown below which I cannot associate to extra options needed in docker-machine create -d scaleway:

Do you the same issue with the rescue mode  because your server is working properly currently ?
https://www.scaleway.com/docs/perform-rescue-action-on-my-server/
remyleone commented 6 years ago

I cannot reproduce the problem. If you delete all your docker-machine instance and starts from 0 are you able to reproduce the problem?

I created a machine using the following command:

$ docker-machine create -d scaleway pouet
Running pre-create checks...
Creating machine...
(pouet) Creating SSH key...
(pouet) Creating server...
(pouet) Starting server...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env pouet

In my environment I got SCALEWAY_TOKEN, SCW_ACCESS_KEY and SCALEWAY_ORGANIZATION defined.

optimum-web commented 6 years ago

Can it be wrong image I used in command, possibly there are damaged images ?
I used image="e20532c4-1fa0-4c97-992f-436b8d372c07" public image which is Ubuntu Xenial x86_64 Where can I get the list of valid images?

remyleone commented 6 years ago

You can get the list of all Scaleway images for your AZ with the following curl commands:

$ curl -H "X-Auth-Token: token-id" https://cp-par1.scaleway.com/images 
$ curl -H "X-Auth-Token: token-id" https://cp-ams1.scaleway.com/images 
optimum-web commented 6 years ago

I suppose it should be https://cp-ams1.scaleway.com/images resource, because /servers gives empty list

optimum-web commented 6 years ago

Anyway, try use --scaleway-image="e20532c4-1fa0-4c97-992f-436b8d372c07" and you will be able to reproduce.

remyleone commented 6 years ago

Still cannot reproduce:

$ docker-machine create -d scaleway --scaleway-image "e20532c4-1fa0-4c97-992f-436b8d372c07" --scaleway-region "par1" default

Running pre-create checks...
Creating machine...
(default) Creating SSH key...
(default) Creating server...
(default) Starting server...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with ubuntu(systemd)...
Installing Docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env default
antapos commented 6 years ago

I installed docker-machine-driver-scaleway using

# install latest (git) version of docker-machine-driver-scaleway
$ brew tap scaleway/scaleway
$ brew install scaleway/scaleway/docker-machine-driver-scaleway --HEAD

I created the machine using the following command:

    docker-machine create -d scaleway \
      --scaleway-organization=$SCALEWAY_ACCESS_KEY \
      --scaleway-token=$SCALEWAY_TOKEN \
      --scaleway-name="$MACHINE_NAME" \
      --scaleway-region=$SCALEWAY_REGION \
      --scaleway-commercial-type=$SCALEWAY_COMMERCIAL_TYPE \
      --scaleway-image=$SCALEWAY_IMAGE \
      $MACHINE_NAME

where SCALEWAY_REGION was tested with am1, par1, SCALEWAY_COMMERCIAL_TYPE as VC1S and SCALEWAY_IMAGE was tested with docker and ubuntu-xenial

antapos commented 6 years ago

The problem was due to using docker as default boot image and was resolved after changing it to mainline from the Scaleway console

antapos commented 6 years ago

Today I tried it again on another Macbook, directly from a bash terminal and worked without problem.

Yesterday I tried it on another Macbook from a shell script and got the following output on the Scaleway console:

>>> Fetching kernel modules...
chroot: can't execute '/usr/local/sbin/scw-sync-kernel-modules': No such file or directory
chroot: can't execute '/usr/local/sbin/oc-sync-kernel-modules': No such file or directory
>>> Generating machine-id...
chroot: can't execute '/usr/local/sbin/scw-gen-machine-id': No such file or directory
chroot: can't execute '/usr/local/sbin/oc-gen-machine-id': No such file or directory
>>> /sbin/init may be broken
>>> make sure /sbin/init is an executable and not an absolute symlink
>>> Switching to linux...
End of Scaleway' initrd
           _ _       _      _          _ _
___ _ _ _|_| |_ ___| |_   | |_ ___   | |_|___ _ _ _ _
|_ -| | | | |  _|  _|   |  |  _| . |  | | |   | | |_'_|
|___|_____|_|_| |___|_|_|  |_| |___|  |_|_|_|_|___|_,_|

BusyBox v1.22.1 (Ubuntu 1:1.22.0-15ubuntu1) multi-call binary.

Usage: switch_root [-c /dev/console] NEW_ROOT NEW_INIT [ARGS]

Free initramfs and switch to another root fs:
chroot to NEW_ROOT, delete all in /, move NEW_ROOT to /,
execute NEW_INIT. PID must be 1. NEW_ROOT must be a mountpoint.

        -c DEV  Reopen stdio to DEV after switch

[   14.762473] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[   14.762473] 
[   14.764225] Kernel Offset: disabled
[   14.764828] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[   14.764828]
remyleone commented 6 years ago

@antapos @optimum-web Do you still currently experience problems? Can this issue be closed?

optimum-web commented 6 years ago

I just tried to create once again. I used same command like @antapos commented above and same result. options: ['docker-machine', 'create', '--driver=scaleway', '--scaleway-name', 'sample-scaleway11.com', '--scaleway-organization', '*, '--scaleway-token', '***', '--scaleway-image', 'e20532c4-1fa0-4c97-992f-436b8d372c07', '--scaleway-commercial-type', 'C2S', 'sample-scaleway11.com'

NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS sample-scaleway11.com scaleway(C2S) Timeout

Secondly I received this: Error in driver during machine creation: No such image: ubuntu-xenial

antapos commented 6 years ago

@sieben: I will try it again tonight on my other laptop which had the problem and let you know.

optimum-web commented 6 years ago

One more thing I found out, I don't have set any SSH key in my account, not sure this could be a reason. As I understand docker-machine should generate them.

optimum-web commented 6 years ago

I tried additionally "docker" image and received 1: Error creating machine: Error in driver during machine creation: Too many candidates for docker (2)

remyleone commented 6 years ago

Indeed, docker-machine generates SSH keys.

I used this command:

$ docker-machine create --driver scaleway --scaleway-name sample-scaleway11.com --scaleway-image e20532c4-1fa0-4c97-992f-436b8d372c07 --scaleway-commercial-type C2S sample-scaleway11.com

A server is successfully created but at the moment it doesn't seem to answer to ping. Could you try again with a VC1S or virtualized instances?

optimum-web commented 6 years ago

Guys, I think there is a problem with the image list provided by Scaleway API, I can not agree to close this issue until I create and connect server .

optimum-web commented 6 years ago

Finally it worked ! I changed commercial type to VC1S.

optimum-web commented 6 years ago

The worst thing is that I still don't understand the reason of all previous fails.

remyleone commented 6 years ago

docker-machine only send simple HTTPS requests to our APIs. I think the bug mostly come from different code paths between baremetal instances and virtual instances. I will keep on investigate and try to reproduce your bug. Is it fine to close the issue now?

Le lun. 23 avr. 2018 à 20:03, Optimum notifications@github.com a écrit :

The worst thing is that I still don't understand the reason of all previous fails.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scaleway/docker-machine-driver-scaleway/issues/81#issuecomment-383668263, or mute the thread https://github.com/notifications/unsubscribe-auth/AASXIqBKgi4m0KMj_i7nMOm1XxlCYwUqks5trhd0gaJpZM4TZ1jo .

optimum-web commented 6 years ago

Yes, sure

antapos commented 6 years ago

The problem insists. The strange thing is that executing the exact same docker-machine create command sometimes fails and sometimes it works!

remyleone commented 6 years ago

@antapos What are the typical stacktraces that you get? Do you have a wide variety of errors kind?

antapos commented 6 years ago

@sieben No, it is always the same error which appears during provisioning of the machine and is shown at: https://github.com/scaleway/docker-machine-driver-scaleway/issues/81#issuecomment-383614365

The problem whenever it appears is solved by changing the bootscript from docker tomainline as shown in the following image and rebooting the cloud server:

2018-04-24 11 14 34
antapos commented 6 years ago

@sieben Another way to fix the problem is to turn ON the 'ENABLE LOCAL BOOT':

2018-04-24 11 24 25
optimum-web commented 6 years ago

@antapos Is it possible to set "ENABLE LOCAL BOOT" to "ON" through API ?

optimum-web commented 6 years ago

I still getting weird errors like "no such boot script Docker "