jupyterhub / binderhub

Run your code in the cloud, with technology so advanced, it feels like magic!
https://binderhub.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2.55k stars 388 forks source link

Persist user sessions over multiple binder links for the same repo #425

Open choldgraf opened 6 years ago

choldgraf commented 6 years ago

(I may have posted this before so feel free to link+close if it's already an issue, but I couldn't find it)

Right now, if somebody clicks two binder links in succession, then two different binder pods will be created for them. As more resources like https://www.inferentialthinking.com/ and the sphinx-gallery plugin start popping up, this is going to result in a lot of inefficient use of pods.

We should make BinderHub behave kind of like jupyterhub on the short-term. If someone clicks a bunch of links that point to the same repository, it should find a pre-existing pod linked to their IP (maybe with a cookie?) and just direct them there, rather than rebuilding a pod.

I feel like this may be blocked until #323 is merged? Curious what people think

User behavior

  1. A user clicks a binder link
  2. User clicks another link relatively quickly (within 10 minutes) A. link that points to same repository, different path B. link that points to different repository

What happens now

A. A new Binder pod is created, local token updated, user directed there B. A new Binder pod is created, local token updated, user directed there

What we would like to happen

A. BinderHub detects this is the same repository as a pre-existing user pod, redirects the user to a new path on that pod, does not create a new pod. B. A new Binder pod is created, local token updated, user directed there

betatim commented 6 years ago

We already give the user a cookie that contains the token for the single user notebook server they launched. We could also store the username (https://hub.mybinder.org/user/choldgraf-binder-stats-mejorh0l/ <- last part here). When a user returns and has that cookie, before launching a new pod check with the proxy if there is a route for this "username", if yes follow it instead of launching a new pod.

Actually I think without a API token you can't ask the proxy or jhub about existing/alive users can you. But in JS we could make a request to the proposed URL and see if we get a 403 or not.

Would that work? It doesn't introduce new state anywhere (I think).

willingc commented 6 years ago

Let me see if I understand the problem:

User behavior

What happens now

What we would like to happen

Feel free to add edit here and add in current behavior and desired behavior in all cases.

choldgraf commented 6 years ago

@willingc I updated the top-level comment with your helpful template! I need to get in the habit of doing that instinctively

psychemedia commented 6 years ago

Unpicking scenarios a bit more, I can think of several:

1) S/one clicks a link on Github repo and then clicks on it again; this might mean they didn't finish the session or it might mean they want a clean session; eg in first case, they maybe tried one notebook in the repo, then realise they want to try another and might sensibly return to the first session; eg in second case, someone doing demos to different people might want a clean start each time. I could imagine a flag in button link URL specifying which behaviour is desired, perhaps with the default of reusing an already running pod associated with that browser cookie/repo if possible. 2) Sphinx docs - people trying multiple examples: someone clicks a link from one page, then clicks another link from another example in the same page or a different page; the same session may or may not suffice depending on assumptions made in the example statement about prior state. Which is to say, it's not clear if multiple examples in same page are stateless or not. eg two examples may be independent or first example may be dependent on previous example having been run; (there is a complementary issue in some sphinx docs eg where a preamble %matplotlib cell needs to be run to display graphical output but a novice user might only try to run the example cell (%matplotlib inline should be autorunnable dependent cell?) 3) Thebelab style interaction, eg embedded runnable examples in Sphinx generated docs: in this case, there may be multiple examples in one page, or examples spread across several pages; again, in case of multiple-examples in one page, the desired behaviour might be stateful or stateless based on prior cell execution. It may also be the case in long running examples in documentation spread over several HTML pages that an example on one page may be dependent on the prior running of an example on a previous page; cf. the SageMath cell server, which can be used to embed a runnable cell in a VLE/LMS, (example: Moodle plugin for SageMath cell server) which is a stateless, single cell execution service.

betatim commented 6 years ago

I think 3. is taken care of by the various JS extensions that exist:

Sharing or not of a binder instance is something that can be configured in each of these. I think as binderhub operator we should not try and force someone using them into a particular behaviour.

For 1. and 2. I think we can try and reuse an existing binder instance for the same user. My guess would be that for most uses that want a "clean slate" all they need is to start a new kernel. Then those who really want a clean clean slate we should offer an obvious button to shutdown the existing binder instance. It will add a few extra clicks for the demo use case you describe but would help save a lot of resources for the (I think) more frequent use case where a user visits several binder links in short succession.

manics commented 3 years ago

(I may have posted this before so feel free to link+close if it's already an issue, but I couldn't find it)

It was this one :smile: https://github.com/jupyterhub/binderhub/issues/327 I'll keep this one open instead since it's got more discussion.