Open miguel2488 opened 5 years ago
Having the same problems here. @miguel2488 have you had any luck with a fix?
Nope, nothing new here, i wasn't able to fix it since i don't have a clue about where the problem is coming. Instead of using datablab, i resigned myself to run jupyter notebooks on the machine, i'm totally blind with this and for what i've seen so far, no one seems to care about this thread. I wish you a better luck.
I had the same problem, and observed that the container running jupyter on the VM took ~5 minutes to start up. My workaround was:
datalab create ... --ssh-log-level=debug
Connection refused
messages to begingcloud compute ssh ...
, then run docker ps
every 1-2 minutes until the logger and datalab containers appeardatalab connect ...
Then I was able to use datalab in the normal way. Hello hacktuarial,
I have the same issue, and tried your solution.
the datalab container never appears for me.
Did you simply run cloud compute ssh ...(name of instance)
?
Thanks for your help !
Yes, that's what I ran. Can you post a sample of your ssh logs? It sounds like the problem may be with the datalab create
command.
I was using a datalab connect ...
command until now, and tried really with datalab create ...
.
It actually works exactly as you said, the loggers and datalab containers appeared !
It has maybe something to do with the way I created my instance at the beginning, I used:
datalab beta create-gpu datalab-instance-name
at the time.
Anyway, I am now able to use Datalab ! Thanks :)
It seems that when creating an instance with a GPU, the same problem appears but this solution does not apply.
I have now created it for an hour, and docker ps
only shows the logger container but no datalab container.
I have a similar problem with @MichaelTheBrute.
I tried to launch an instance of Datalab with the command below.
$ datalab beta create-gpu --machine-type n1-standard-4 --zone us-west1-b --accelerator-type nvidia-tesla-k80 --accelerator-count 1 datalab-instance
By accepting below, you will download and install the
following third-party software onto your managed GCE instances:
NVidia GPU Driver: NVIDIA-Linux-x86_64-390.46
Do you accept (y/N)?: y
Creating the disk datalab-instance-pd
Creating the instance datalab-instance
Due to GPU Driver installation, please note that Datalab GPU instances take significantly longer to startup compared to non-GPU instances.
Created [https://www.googleapis.com/compute/beta/projects/xxxxxxxx/zones/us-west1-b/instances/datalab-instance].
Connecting to datalab-instance.
This will create an SSH tunnel and may prompt you to create an rsa key pair. To manage these keys, see https://cloud.google.com/compute/docs/instances/adding-removing-ssh-keys
Waiting for Datalab to be reachable at http://localhost:8081/
However, there is no response after more than 30 minutes. I saw information that it took about 15 minutes, but I thought it was still too long.
I made an ssh connection to the instance and started investigating.
As discussed before, I also ran the docker ps
command.
$ datalab@datalab-instance ~ $ sudo docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4994361cf048 gcr.io/google-containers/fluentd-gcp:2.0.17 "/bin/sh -c '/run.sh…" 19 minutes ago Up 19 minutes 80/tcp logger
The datalab container was not running.
However, when I ran the same command a few minutes later, I saw gcr.io/cloud-datalab/datalab-gpu:latest
image only once.
(I forgot to take notes.)
Since then, we have never been able to see the container.
When the CPU worked correctly, I thought that it might be because the GPU was not set up correctly. The GPU setup seems to be done in the startup script, so I checked that the script finished successfully.
datalab@datalab-instance ~ $ systemctl status google-startup-scripts.service
● google-startup-scripts.service - Google Compute Engine Startup Scripts
Loaded: loaded (/usr/lib/systemd/system/google-startup-scripts.service; disabled; vendor preset: disabled)
Active: inactive (dead) since Tue 2020-02-11 07:27:30 UTC; 34min ago
Main PID: 421 (code=exited, status=0/SUCCESS)
CPU: 881ms
I checked the log with the journalctl
command, but it seemed to have finished successfully.
In the process, I noticed that wait-for-startup-script.service
did not finish properly.
datalab@datalab-instance ~ $ systemctl --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● wait-for-startup-script.service loaded failed failed Wait for the startup script to setup required directories
datalab@datalab-instance ~ $ sudo journalctl -u wait-for-startup-script.service
-- Logs begin at Tue 2020-02-11 06:59:19 UTC, end at Tue 2020-02-11 08:05:27 UTC. --
Feb 11 06:59:34 datalab-instance systemd[1]: Starting Wait for the startup script to setup required directories...
Feb 11 06:59:34 datalab-instance docker-credential-gcr[768]: ERROR: Unable to save docker config: mkdir /root/.docker: read-only file system
Feb 11 06:59:34 datalab-instance systemd[1]: wait-for-startup-script.service: Control process exited, code=exited status=1
Feb 11 06:59:34 datalab-instance systemd[1]: wait-for-startup-script.service: Failed with result 'exit-code'.
Feb 11 06:59:34 datalab-instance systemd[1]: Failed to start Wait for the startup script to setup required directories.
Feb 11 06:59:34 datalab-instance systemd[1]: wait-for-startup-script.service: Consumed 82ms CPU time
Feb 11 06:59:34 datalab-instance systemd[1]: Starting Wait for the startup script to setup required directories...
Feb 11 06:59:34 datalab-instance docker-credential-gcr[792]: ERROR: Unable to save docker config: mkdir /root/.docker: read-only file system
Feb 11 06:59:34 datalab-instance systemd[1]: wait-for-startup-script.service: Control process exited, code=exited status=1
Feb 11 06:59:34 datalab-instance systemd[1]: wait-for-startup-script.service: Failed with result 'exit-code'.
Feb 11 06:59:34 datalab-instance systemd[1]: Failed to start Wait for the startup script to setup required directories.
Feb 11 06:59:34 datalab-instance systemd[1]: wait-for-startup-script.service: Consumed 94ms CPU time
You can confirm that an error has occurred in docker-credential-gcr
.
I don't understand what this means in the startup-script, but I hope it helps.
I will continue to investigate.
May be related to this Pull Request. https://github.com/googledatalab/datalab/pull/2147
Hi,
i've been working on this for days, and have read a lot in google about this issue. Although i couldn't find anything to help me solving it.
The case is that i have created a datalab instance via the gcloud shell like this:
datalab create --image-name c2-deeplearning-tf-1-13-cu100-20190227 --disk-size-gb 100 --machine-type n1-standard-8 my-instance --network-name my-net-01 --zone europe-west1-b
it all works fine, i'm asked to create a passphrase, rsa keys are propagated and then, i got this message of death:
Waiting for Datalab to be reachable at http://localhost:8081/
I can SSH to the vm instance using the button to the right, or using
gcloud compute ssh instance
. No problems with that.Running the
datalab connect
command passing--ssh-log-level=debug
i got thousands of messages like this one:It walks through all the ports trying to connect to the 8081 port but it never succeeds, so finally after a long waiting, i get this message:
connection closed
attempting to reconnect
and the whole process starts again from the beginning.
This is a screenshot of my firewall rules:
i think everything is ok here. What am i missing?? Where's the problem?? Can someone help please? i've been stuck here for over a week now, any help will be much appreciated.
Thank you very much in advance.