googledatalab / datalab

Interactive tools and developer experiences for Big Data on Google Cloud Platform.
Apache License 2.0
974 stars 249 forks source link

Gateway mode deprecated? #1294

Open dovy opened 7 years ago

dovy commented 7 years ago

@ojarjur has Gateway mode for datalab been deprecated?

https://github.com/googledatalab/datalab/commit/414e43fa81aec4669828307af8a935fd69fa2f2d

ojarjur commented 7 years ago

@dovy Yes, it has.

Sorry that we didn't do a better job making this explicitly clear. I'll update the release info wiki page to explicitly state that.

We dropped the gateway mode when we put out our GA release because we found that it was not feasible to make it as reliable as what we wanted.

In particular, the separation of the sessions (tracked by the notebook server) from the kernels (tracked by the kernel gateway), meant that the notebook server could lose track of running kernels if it got restarted, and there was no reasonable way to prevent that.

Our recommendation now, if you want to work with notebooks that are on your workstation, is to use the gcloud compute copy-files command to copy those files to and/or from the GCE VM where Datalab is running.

dovy commented 7 years ago

That's extremely disappointing. When was the last commit you made before you started removing support for it. I fear we'll need to fork the project then.

ojarjur commented 7 years ago

@dovy The last commit was https://github.com/googledatalab/datalab/commit/22e2d0123a6d3b97df00cf2e4fa78e5bda07b83e.

What were the features from the gateway mode that you require? We can track those features and see if we can support them via some other means.

Was it solely the ability to store your notebooks locally, or was there something else too?

dovy commented 7 years ago

@ojarjur Honestly, with the removal of this it feels like the relevance of Datalab for us has been deprecated. We wanted to be able to have developers work locally and commit code when they were ready. With your current approach of datalab, you force all code to run remotely and CAN commit to a git Repo, but how does that allow a user to test and save checkins before publishing to the world?

Our preferred flow was similar to that of other systems.

  1. Develop locally
  2. Test locally.
  3. Commit to public repo.

I fear, we lose that feature with the loss of the gateway.

dovy commented 7 years ago

Also, does that mean you won't connect datalab to a dataproc cluster as well? Meaning the only "datalab" environment is a single Compute Engine instance?

ojarjur commented 7 years ago

Also, does that mean you won't connect datalab to a dataproc cluster as well?

You can still use the dataproc initialization action. That wasn't part of the GA release because we decided it wasn't ready to include as part of GA yet, but it is not deprecated.

ojarjur commented 7 years ago

We wanted to be able to have developers work locally and commit code when they were ready. With your current approach of datalab, you force all code to run remotely and CAN commit to a git Repo, but how does that allow a user to test and save checkins before publishing to the world?

I'm not sure what you mean by "publishing to the world". Datalab instances are single user environments, so the notebooks on a Datalab instance are not shared with others until you explicitly share them by either committing them to a repository or copying them to some shared location.

Project editors can still force their way into the instance, but you can block non-project-editor users from doing that by following the instructions here.

Also, FWIW, you can still run all of Datalab locally.

Running all of Datalab locally is not part of our GA release because we can't commit to helping users get Docker running on their local machines. However, that doesn't mean that running locally is deprecated, just that we don't provide support for it.

Do either of those approaches fit your needs, or is there still a gap?