googledatalab / datalab

Interactive tools and developer experiences for Big Data on Google Cloud Platform.
Apache License 2.0
975 stars 249 forks source link

No automatic auth to gcp services when using custom datalab image #1922

Open charlesverdad opened 6 years ago

charlesverdad commented 6 years ago

I built my own datalab image using a Dockerfile of the form below and uploaded it to my project's gcr: asia.gcr.io/<proj>/my-datalab:latest. Note that I did not modify the source's CMD or ENTRYPOINT if they exist.

FROM gcr.io/cloud-datalab/datalab:latest
# COPY lots of my project's code (python)
# pip install libraries

I then ran the following command to create a datalab instance:

datalab create \
    --image-name asia.gcr.io/<proj>/my-datalab:latest \
    --disk-size-gb 60 \
    --machine-type n1-standard-4 \
    --zone asia-east1-a \
    --network-name my-net \
    my-datalab-charles

Problem 1 (SOLVED)

localhost:8081 wasn't starting up so I ssh'd into my instance and found thru journalctl that docker couldn't pull the image because it doesn't have repo access. I then checked the /etc/systemd/system/docker.service file and found out about the docker-credential-gcr utility so I used it by running:

docker-credential-gcr gcr-login

I used the link it gave me and generated an authorization code. I entered it back and ran the docker pull manually - it worked. But shouldn't this happen automatically? I remember it happens automatically if I don't use the --image-name tag in datalab create.

Problem 2 Inside the datalab notebook, I can't run %sql or %bq. It only says line magic function not found. I tried to run import google.datalab and got this:

AttributeErrorTraceback (most recent call last)
<ipython-input-10-db74b58b7e51> in <module>()
----> 1 import google.datalab

/usr/local/lib/python2.7/dist-packages/google/datalab/__init__.py in <module>()
     11 # the License.
     12 
---> 13 from google.datalab._context import Context
     14 from google.datalab._job import Job, JobError
     15 

/usr/local/lib/python2.7/dist-packages/google/datalab/_context.py in <module>()
     18 from builtins import object
     19 
---> 20 from google.datalab.utils import _utils as du
     21 
     22 

/usr/local/lib/python2.7/dist-packages/google/datalab/utils/__init__.py in <module>()
     20 from ._lambda_job import LambdaJob
     21 from ._dataflow_job import DataflowJob
---> 22 from ._utils import print_exception_with_last_stack, get_item, compare_datetimes, \
     23     pick_unused_port, is_http_running_on, gcs_copy_file, python_portable_string
     24 

/usr/local/lib/python2.7/dist-packages/google/datalab/utils/_utils.py in <module>()
     34 import google.auth.exceptions
     35 import google.auth.credentials
---> 36 import google.auth._oauth2client
     37 
     38 

/usr/local/lib/python2.7/dist-packages/google/auth/_oauth2client.py in <module>()
    126         _convert_service_account_credentials,
    127     oauth2client.service_account._JWTAccessCredentials:
--> 128         _convert_service_account_credentials,
    129     oauth2client.contrib.gce.AppAssertionCredentials:
    130         _convert_gce_app_assertion_credentials,

AttributeError: 'module' object has no attribute '_JWTAccessCredentials'

Problem 3 I think the backup scripts are failing. In the journalctl, I get these warnings:

Jan 19 07:17:35 my-datalab-charles docker[1753]: {"name":"app","hostname":"my-datalab-charles","pid":106,"level":50,"msg":"WARNING: Backup script with tag hourly failed with code: 1","time":"2018-01-19T07:17:35.838Z","v":0}
Jan 19 07:17:36 my-datalab-charles docker[1753]: {"name":"app","hostname":"my-datalab-charles","pid":106,"level":50,"msg":"WARNING: Backup script with tag daily failed with code: 1","time":"2018-01-19T07:17:36.234Z","v":0}
Jan 19 07:17:36 my-datalab-charles docker[1753]: {"name":"app","hostname":"my-datalab-charles","pid":106,"level":50,"msg":"WARNING: Backup script with tag weekly failed with code: 1","time":"2018-01-19T07:17:36.309Z","v":0}
Jan 19 07:27:35 my-datalab-charles docker[1753]: {"name":"app","hostname":"my-datalab-charles","pid":106,"level":50,"msg":"WARNING: Backup script with tag daily failed with code: 1","time":"2018-01-19T07:27:35.936Z","v":0}
Jan 19 07:27:36 my-datalab-charles docker[1753]: {"name":"app","hostname":"my-datalab-charles","pid":106,"level":50,"msg":"WARNING: Backup script with tag weekly failed with code: 1","time":"2018-01-19T07:27:36.036Z","v":0}
Jan 19 07:27:36 my-datalab-charles docker[1753]: {"name":"app","hostname":"my-datalab-charles","pid":106,"level":50,"msg":"WARNING: Backup script with tag hourly failed with code: 1","time":"2018-01-19T07:27:36.405Z","v":0}
Jan 19 07:37:35 my-datalab-charles docker[1753]: {"name":"app","hostname":"my-datalab-charles","pid":106,"level":50,"msg":"WARNING: Backup script with tag weekly failed with code: 1","time":"2018-01-19T07:37:35.495Z","v":0}
Jan 19 07:37:36 my-datalab-charles docker[1753]: {"name":"app","hostname":"my-datalab-charles","pid":106,"level":50,"msg":"WARNING: Backup script with tag daily failed with code: 1","time":"2018-01-19T07:37:36.507Z","v":0}
Jan 19 07:37:36 my-datalab-charles docker[1753]: {"name":"app","hostname":"my-datalab-charles","pid":106,"level":50,"msg":"WARNING: Backup script with tag hourly failed with code: 1","time":"2018-01-19T07:37:36.605Z","v":0}
charlesverdad commented 6 years ago

This might have something to do with Problem 1:

Jan 19 07:00:27 my-datalab-charles docker[1278]: Unable to find image 'asia.gcr.io/<proj>/my-datalab:latest' locally
Jan 19 07:00:27 my-datalab-charles docker[1278]: /usr/bin/docker: Error response from daemon: repository asia.gcr.io/<proj>/my-datalab not found: does not exist or no pull access.
Jan 19 07:00:27 my-datalab-charles docker[1278]: See '/usr/bin/docker run --help'.
Jan 19 07:00:27 my-datalab-charles systemd[1]: datalab.service: Main process exited, code=exited, status=125/n/a
Jan 19 07:00:27 my-datalab-charles systemd[1]: datalab.service: Unit entered failed state.
Jan 19 07:00:27 my-datalab-charles systemd[1]: datalab.service: Failed with result 'exit-code'.
Jan 19 07:00:29 my-datalab-charles systemd[1]: datalab.service: Service hold-off time over, scheduling restart.
Jan 19 07:00:29 my-datalab-charles systemd[1]: Stopped datalab docker container.
Jan 19 07:00:29 my-datalab-charles systemd[1]: datalab.service: Start request repeated too quickly.
Jan 19 07:00:29 my-datalab-charles systemd[1]: Failed to start datalab docker container.
Jan 19 07:00:29 my-datalab-charles systemd[1]: datalab.service: Unit entered failed state.
Jan 19 07:00:29 my-datalab-charles systemd[1]: datalab.service: Failed with result 'exit-code'.
Jan 19 07:07:21 my-datalab-charles systemd[1]: [/etc/systemd/system/datalab.service:8] Executable path is not absolute, ignoring: docker-credential-gcr configure-docker

UPDATE: Problem 1 is solved by making the ExecStartPre command into an absolute path. will file an MR and update if this also solves the other 2 problems. UPDATE2: looks like [at least] problem 1 is a duplicate of https://github.com/googledatalab/datalab/pull/1911. But problems 2 and 3 still persists.

chmeyers commented 6 years ago

For #3: Can you check /datalab/.backup_log.txt That's where the backup script sends it's logs: https://github.com/googledatalab/datalab/blob/dcedb1ef801ef0be4571c2afa7c7ffcae7eb28c4/sources/web/datalab/backupUtility.ts#L52

harmon commented 6 years ago

The issue of pulling a private docker image correctly has been fixed by me and merged by the maintainers here (thanks guys!). We just need a new release of datalab CLI to build released:

https://github.com/googledatalab/datalab/pull/1911

chmeyers commented 6 years ago

1911 went out with yesterday's release of gcloud 186.0.0, so just run "gcloud components update" and you should get it.

harmon commented 6 years ago

@chmeyers Oh, thanks! I hadn't checked today :)

harmon commented 6 years ago

@chmeyers I didn't see a new tag in this repo, which I thought you used to publish new versions to gcloud cli, but maybe you guys have a different internal release process that isn't documented. Thanks for the update!

chmeyers commented 6 years ago

CLIs get a tracking issue labeled with "cli-release", in this case #1921. The actual release process is internal to Google as it has to get bundled with the rest of gcloud and go through a suite of tests.

harmon commented 6 years ago

Awesome, thanks!