Open wcwong opened 2 years ago
This is on our radar, apologies for the friction. We don't support auth.authenticate_user() today for a few reasons, we're tracking a fix at b/207007587
how do i connect to colab with private GCE server.
@cperry-goog Any updates on this? I launched an A100 instance with Google Colab VM specifically to use my Colab Notebook I was using on the Colab Pro + account I was paying for, but on beefier hardware, but can't connect to Drive so it's useless.
Is there a workaround? It would be really handy if results from a Colab Notebook could be saved to Drive.
Keep in mind that a custom GCE VM will be accessible to all users who have access to VMs within that project. Because of this you need to be careful about putting credentials on the VM- they will be accessible to everyone with access to that VM.
Because the Colab service cannot guarantee the VM is only accessible to a single user we are not allowed to provide credentials to it.
An alternative is to use something such as https://github.com/astrada/google-drive-ocamlfuse-
@blois - I'm a little surprised that anyone who has access to the project has access to notebook data by default. My assumption was that the environment ran in its own container with each different user connecting being given their own containers and their own container local storage. Isn't that how it works in the hosted environment?
I guess that's mostly the crux of my confusion. If there are sufficient environmental protections in place for the hosted environment, why isn't a project security boundary considered equivalent? How is this different than any other project level security boundaries in GCP?
Specifically, doesn't the workaround you describe also put the credentials on the VM? And with FUSE can't anyone in the project, by default, ssh to the VM, then sudo su to the user and have access to the FUSE drive? So this doesn't materially change the security posture?
Because the Colab service cannot guarantee the VM is only accessible to a single user we are not allowed to provide credentials to it.
It's not necessary a gmail login. It can be a service account (the one the VM has access to). Why not to support it for better UX?
This is on our radar, apologies for the friction. We don't support auth.authenticate_user() today for a few reasons, we're tracking a fix at b/207007587
You can add in your policy to comply users with agreement and allow google.colab.auth for those who use custom GCE VM runtime. Also it would be nice if it's available in google cloud function na kub.
Has this been resolved? Colab Pro + only ever gives me P100s so I upgraded to a A100 with GCE Vms but now I can't access all my google drive files.
Was using ocamlfuse solution to access my Drive, but that has just stopped working too. Have to look for an alternative solution, again. I hope this issue gets adressed, using drive for data storage was quite convenient for smaller personal and research projects.
how do i connect to colab with private GCE server.
@cperry-goog Any updates on this? I launched an A100 instance with Google Colab VM specifically to use my Colab Notebook I was using on the Colab Pro + account I was paying for, but on beefier hardware, but can't connect to Drive so it's useless.
Any updates on this? Trying to connect a custom GCE VM, but it is an unsupported environment
https://github.com/googlecolab/colabtools/issues/2533#issuecomment-1018080844 is still the current status.
If we're connecting to the custom GCE VM through a locally-hosted runtime (via port-forwarding), there's no way to install omcamlfuse, since terminal functionality is disabled.
What's the point of using Colab if we can't use beefier hardware? Any recommendations for alternative services?
It's september. This still hasn't been resolved? Very disappointed.... we just upgraded for the same reasons and got caught by this bug.
Hey everyone, I'm just as confused and annoyed at the lack of Google Drive integration with GCE. I hope we find a fix soon.
I think I know why ocamlfuse is failing now. I haven't found a fix yet, but as soon as I figure out alternative credentials for ocamlfuse, I'll post my results here.
Out-Of-Band Error : -- https://developers.google.com/identity/protocols/oauth2/resources/oob-migration
Using OAuth 2.0 to Access Google APIs"On February 16 2022, we announced plans to make Google OAuth interactions safer by using more secure OAuth flows. This guide helps you to understand the necessary changes and steps to successfully migrate from the OAuth out-of-band (OOB) flow to supported alternatives. This effort is a protective measure against phishing and app impersonation attacks during interactions with Google's OAuth 2.0 authorization endpoints."
Keep in mind that a custom GCE VM will be accessible to all users who have access to VMs within that project. Because of this you need to be careful about putting credentials on the VM- they will be accessible to everyone with access to that VM.
Because the Colab service cannot guarantee the VM is only accessible to a single user we are not allowed to provide credentials to it.
I mentioned it already in the thread and will do it again. If the only concern is that Google Drive creds/tokens will be accessible to everyone who has access to that VM then we can use a service account.
google.colab.auth()
knows to auth and access the drive using the service account It would be nice there is a bypass/opt-out. Not everyone cares about data privacy that much. Our lab for example has all of our data in a shared space (within our lab of course). But essentially anyone has access to the VM and google account, should also have the access to data/drive.
I mean google-drive-ocamlfuse works. but I'd expect it to work out of box.
At least have a prompt when trying to mount etc
Any update on this issue?
Scrolled through this thread hoping for a solution and was met with disappointment..
Disappointment in 2023...
@cperry-goog @blois Any updates on the issue? What is the status of b/207007587
?
After a year and two months of waiting, any update on this issue?
Any updates? Paid for custom GCE VM and immediately regretted.
Any updates?
I came here because I'm facing the same issue... unbelievable that there's no updates on this yet.
@cperry-goog are there any updates on that issue?
Same issue. Hoping for an update!
Hi all! I wanted to share the solution that has been working for me since it seems that this has been an ongoing issue for a lot of people.
I've been using google-drive-ocamlfuse to mount my gDrive on a custom GCE VM. The process is a bit involved and not the most elegant, but it works.
First you'll need to create a new project and OAuth credentials via the API Console. The key here is that we'll need to set it up for Headless Usage since Google Colab doesn't have a web browser.
Follow the steps here on ocamlfuse's documentation to setup Headless Usage HERE and this should give you API access to your Drive, with a client ID and secret key.
Once you have your client ID and secret key setup, you can install ocamlfuse with the following command
!sudo add-apt-repository ppa:alessandro-strada/ppa
!sudo apt-get update
!sudo apt-get install google-drive-ocamlfuse
and then you should be able to now mount your drive with this
!google-drive-ocamlfuse -headless -label me -id ##yourClientID##.apps.googleusercontent.com -secret ###yoursecret#####
which should then show you something similar to this
Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=##yourClientID##.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force
which will take you to a credential page and you can copy and paste your key
Please enter the verification code:
And that's basically it! You should be able to mount your drive with the code below
!mkdir -p /content/drive/MyDrive
!google-drive-ocamlfuse /content/drive/MyDrive
The only thing is that this has made things cumbersome for when I just have a single notebook that I like to run on either a hosted runtime or GCE VM, so I've made the code below in order to determine whether or not it's on a GCE VM, install ocamlfuse if needed, and mount the drive the old fashioned way, or with ocamlfuse. I pretty much have this code block on all of my notebooks now. Hope this helps!!! Just make sure to replace your client ID and secret keys
#Mount Google Drive
import re
import os
version = !cat /proc/version
if re.search("gce", version[0]):
print("Session is connected to a custom GCE VM, running ocamlfuse")
# Check if ocamlfuse is installed
if 'google-drive-ocamlfuse' in os.popen('pip freeze').read():
print("ocamlfuse is already installed, mounting...")
else:
# If not installed, install it
print("ocamlfuse is not installed, installing...")
#!pip install ocamlfuse
!sudo add-apt-repository ppa:alessandro-strada/ppa
!sudo apt-get update
!sudo apt-get install google-drive-ocamlfuse
# Is anything already mounted? Let's jiggle the handle
!umount /content/drive/MyDrive
!rm -rf ~/.gdfuse/default
!rm -rf /content/drive/MyDrive
!mkdir -p /content/drive/MyDrive
# Mount with ocamlfuse
!google-drive-ocamlfuse -headless -id REPLACE_CLIENT_ID_HERE.apps.googleusercontent.com -secret REPLACE_SECRET_KEY_HERE
!google-drive-ocamlfuse /content/drive/MyDrive
else:
print("Session is connected to a hosted runtime, running Google Auth")
from google.colab import drive
drive.mount('/content/drive')
Hi chriscast88,
Thanks for posting this. After following your guide, I ran this code: !mkdir -p /content/drive/MyDrive !google-drive-ocamlfuse /content/drive/MyDrive
But got this error: /usr/bin/xdg-open: 869: www-browser: not found /usr/bin/xdg-open: 869: links2: not found /usr/bin/xdg-open: 869: elinks: not found /usr/bin/xdg-open: 869: links: not found /usr/bin/xdg-open: 869: lynx: not found /usr/bin/xdg-open: 869: w3m: not found xdg-open: no method available for opening 'https://accounts.google.com/o/oauth2/auth?client_id=XXXXXXXXXXXX.apps.googleusercontent.com&redirect_uri=httpsXXXXXXFgd-ocaml-auth.appspot.com%2Foauth2callback&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force&state=XXXXXXXXXXXXXXXXXXXXX' /bin/sh: 1: firefox: not found /bin/sh: 1: google-chrome: not found /bin/sh: 1: chromium-browser: not found /bin/sh: 1: open: not found Cannot retrieve auth tokens. Failure("Error opening URL:https://accounts.google.com/o/oauth2/auth?client_id=XXXXXXXXXXXX.apps.googleusercontent.com&redirect_uri=httpsXXXXXXXXFgd-ocaml-auth.appspot.com%2Foauth2callback&scope=httpsXXXXXXXXXXXwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force&state=9XXXXXXXXXXXXXXXXXXX"
If anyone has any advice, I'd appreciate it?
Thanks!
!google-drive-ocamlfuse -headless -id REPLACE_CLIENT_ID_HERE.apps.googleusercontent.com -secret REPLACE_SECRET_KEY_HERE google-drive-ocamlfuse /content/drive/MyDrive
Try this?
For my use case, using a Google Storage Bucket as the backing datastore was an equivalent option to Google Drive. It's very straightforward to connect to a bucket with the following code (utilizing gcsfuse)
### MOUNT GOOGLE STORAGE BUCKET
from google.colab import auth
auth.authenticate_user()
!echo "deb https://packages.cloud.google.com/apt gcsfuse-bionic main" > /etc/apt/sources.list.d/gcsfuse.list
!curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
!apt -qq update
!apt -qq install gcsfuse
!mkdir -p mounted-bucket
!gcsfuse --implicit-dirs audio-model-data mounted-bucket
BASE_PATH = "/content/mounted-bucket"
still no fix? Thanks @chriscast88 for solution, worked flawlessly even for Shareddrives with little bit of tweaking.
Posting with permission from @cperry-goog - we're collaborating with the Colab team to provide DagsHub Storage as an alternative to GDrive that is more scalable and built for use with large datasets. It's an S3-compatible bucket that has much simpler access controls, and can be mounted easily.
It might help avoid the issues above - here's a link to an example notebook to try it out
We're looking for community feedback, so I'd love to get your input if it helps with the issue at hand.
(If you're curious, DagsHub is a platform for ML teams which is why we think Colab should have a storage solution suitable for ML workloads)
For my use case, using a Google Storage Bucket as the backing datastore was an equivalent option to Google Drive. It's very straightforward to connect to a bucket with the following code (utilizing gcsfuse)
### MOUNT GOOGLE STORAGE BUCKET from google.colab import auth auth.authenticate_user() !echo "deb https://packages.cloud.google.com/apt gcsfuse-bionic main" > /etc/apt/sources.list.d/gcsfuse.list !curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - !apt -qq update !apt -qq install gcsfuse !mkdir -p mounted-bucket !gcsfuse --implicit-dirs audio-model-data mounted-bucket BASE_PATH = "/content/mounted-bucket"
Umm doesn't this run into the same issue being that "google.colab is unsupported in this environment."
How can I change my google colab compute engine?
auth.authenticate_user()
Still a problem after 2 years...This took time and $
@cperry-goog, Any updates on this?
After deploying a custom GCE VM runtime as per the instructions at https://research.google.com/colaboratory/marketplace.html and connecting, when trying to use the following code
I get the following error
My expectation was that the GCE VM deployed from the marketplace would have the same software environment as the standard runtime but also give me the ability to specify the compute/memory/gpu resources that are avaialble to my GCP project. As such, I was not expecting to need to make code changes to the notebook for it to work on the marketplace GCE VM.