pangeo-data / pangeo

Pangeo website + discussion of general issues related to the project.
http://pangeo.io
698 stars 188 forks source link

Google's Colaboratory #98

Closed mrocklin closed 6 years ago

mrocklin commented 6 years ago

Colaboratory is a collaborative notebook system on Kubernetes hosted by Google. I wonder if there would be interest from that team in enabling scalable systems on the same kubernetes cluster in a similar way to what we're doing with JupyterHub+Kubernetes and daskernetes. This came up in a casual conversation with @lila and @rabernat. cc @jakevdp

rabernat commented 6 years ago

👍

It certainly seems that these projects have similar goals. Would be great to make contact.

shoyer commented 6 years ago

I know the Colaboratory team pretty well.

I'm afraid I don't quite understand what you're getting at here. Colaboratory might very well be using Kubernetes under the hood (honestly I don't know) but I can't find any public information about that. e.g., it's not mentioned in the FAQ: https://research.google.com/colaboratory/faq.html

Can you elaborate a little bit more on what you had in mind?

mrocklin commented 6 years ago

How hard would it be for someone using Colaboratory to launch and use another distributed system within their notebook?

On Sun, Feb 4, 2018 at 5:45 PM, Stephan Hoyer notifications@github.com wrote:

I know the Colaboratory team pretty well.

I'm afraid I don't quite understand what you're getting at here. Colaboratory might very well be using Kubernetes under the hood (honestly I don't know) but I can't find any public information about that. e.g., it's not mentioned in the FAQ: https://research.google.com/ colaboratory/faq.html

Can you elaborate a little bit more on what you had in mind?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo/issues/98#issuecomment-362946832, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszLBD00yaIL0eHAj4qmCXlac6kYz9ks5tRjMOgaJpZM4R3uPa .

shoyer commented 6 years ago

How hard would it be for someone using Colaboratory to launch and use another distributed system within their notebook?

I don't think there's any support for this currently, but if it's simply a matter of running some command line or Python script then it should already work. Colaboratory is a simply another UI on top of an IPython/Jupyter notebook.

mrocklin commented 6 years ago

I expect that we would need to define roles for the Jupyter pods to enable/control their access to deploy services on the rest of the kubernetes cluster.

On Sun, Feb 4, 2018 at 5:55 PM, Stephan Hoyer notifications@github.com wrote:

How hard would it be for someone using Colaboratory to launch and use another distributed system within their notebook?

I don't think there's any support for this currently, but if it's simply a matter of running some command line or Python script then it should already work. Colaboratory is a simply another UI on top of an IPython/Jupyter notebook.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pangeo-data/pangeo/issues/98#issuecomment-362947504, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszDS6Yap3LLzCrUuhSEMo1ZconPpfks5tRjVbgaJpZM4R3uPa .

craigcitro commented 6 years ago

Hi all -- I work on Colab, happy to answer questions. (h/t @shoyer for pointing me this way)

As it happens, we aren't currently using Kubernetes under the hood anywhere. (@mrocklin did you see a note somewhere suggesting that we use k8s?)

As Stephan said, a user can do ~anything they would do from a random machine in colab. In particular, if you run

from google.colab import auth
auth.authenticate_user()

then from that point forward, gcloud and other google-application-default-credential-using commands should Just Work. @mrocklin is there a reason that the Jupyter backend would need to be part of the k8s cluster?

mrocklin commented 6 years ago

Ah, I must have been mistaken about the k8s point. Thanks for clarifying.

is there a reason that the Jupyter backend would need to be part of the k8s cluster?

No strong need, it's just nicer to have a set of machines already running so that we don't have the multi-minute delay of spinning them up.

craigcitro commented 6 years ago

No strong need, it's just nicer to have a set of machines already running so that we don't have the multi-minute delay of spinning them up.

Ah, that makes sense.

Do let me/us know if you try it, or if you hit stumbling blocks!

jreadey commented 6 years ago

FYI: here's a notebook I created on Colab that displays data from the 50TB NREL dataset: https://drive.google.com/file/d/1h9pTzuA3thCwYBsxhnARf7olYtmBCZfd/view?usp=sharing. (I wish GoogleDrive could render Jupyter notebooks like Github does.)

I guess it would be more impressive from Google's perspective if the HSDS service this notebook connects to was running in GCS not AWS!

craigcitro commented 6 years ago

@jreadey I obviously can't speak for everyone, but I'm always first and foremost stoked to see people getting interesting work done, whatever the toolchain. :) Happy to find ways Colab can help with any part of that.

Two more notes:

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 6 years ago

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.