jupyterlab / frontends-team-compass

A repository for team interaction, syncing, and handling meeting notes across the JupyterLab ecosystem.
https://jupyterlab-team-compass.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
58 stars 30 forks source link

Real Time Collaboration Plan #30

Closed saulshanabrook closed 4 years ago

saulshanabrook commented 4 years ago

Over the past few years, many folks have been working on bringing real time collaboration to JupyterLab. It would support new features like:

The current work is in the datastore package in Lumino and in a PR to JupyterLab (https://github.com/jupyterlab/jupyterlab/pull/6871).

Moving forward, we could could move this work out into two new separate repositories:

Here is a drawing I put together to try to show how these different pieces could work together:

Zach and I also started to sketch out the start of the jupyter-datastore APIs, including the REST api and the tables

Adding these new repos has some advantages:

However, it will come at the expense of more maintenance burden, having to set up our own build and testing infrastructure for each repo. And it might be potentially confusing, if folks are not sure what the scope is of the different repos. Its also harder to create cross-repo changes, because it requires coordinating pull requests.

I propose creating these two new repos on the JupyterLab organization and create issues and milestones to track what needs to be done on each. Before that can be done, we have to come up with names for each. Current are this, but we could change them:

cc @vidartf @ellisonbg @afshin @Zsailer

Does anyone have objections or name ideas?

vidartf commented 4 years ago

Regarding moving the work to a new repository, I agree with the intent and proposed structure, but want to mention that it might have been easier if lumino was registered as an org (under the jupyter umbrella similar to jupyterlab, jupyter-widgets, and jupyterhub orgs). Not sure how feasible this is. So other than that we now inline-namespace org names instead of having them be orgs, I agree with the names.

having it as a seperate repo [... allows] us to more freely include third party dependencies

I'm not sure why we want to include third party dependencies. One of the clear strengths of lumino is its non-exposure to leftpad. I'm also not sure why changing the repo should change the philosophy w.r.t. this.

Speaking of the client/server setup, I would argue strongly for keeping any and all Python code out of the lumino repos. I would also argue strongly for not requiring Node to run the jupyterlab server. We can discuss this to great lengths in a separate thread though.

Final note: For attracting more contributors, I think the main barrier is access to good documentation (beyond just API docs, e.g. examples on use, tutorials, architecture overview, documenting how our variant of the CRDT algorithm works). While structuring things separately might gives some advantages, I would desperately prioritize time on writing docs and examples. Such efforts also tend to highlight any pain points in the API, so it would be good to start this sooner than later.

saulshanabrook commented 4 years ago

I'm not sure why we want to include third party dependencies. One of the clear strengths of lumino is its non-exposure to leftpad. I'm also not sure why changing the repo should change the philosophy w.r.t. this.

For example, if we add integration of the datastore with react components or RXJS observables, then these become dependencies.

Speaking of the client/server setup, I would argue strongly for keeping any and all Python code out of the lumino repos. I would also argue strongly for not requiring Node to run the jupyterlab server. We can discuss this to great lengths in a separate thread though.

So maybe we call it not lumino-datastore then but my-fun-RTc-clientside-data-thing-name and it depends on @lumino/datastore which still lives in lumino.

I think it would be nice for new users, coming to whatever the repo is, to be able to use the tools to build their own web app that is RTC. And to do that, they need some sort of server which handles relaying patches.

saulshanabrook commented 4 years ago

Notes from meeting with @blink1073 and @vidartf

bollwyvl commented 4 years ago

Exciting!

I may have to (somewhat jokingly) take exception to a websocket server being a hard requirement in the first place. Over Thanksgiving, this hacked itself together:

https://github.com/deathbeds/jupyterlab-dat

Obviously very wip, but yeah, it pretty much does the thing: a reasonably usable notebook pub/multisub and ephemeral chat built on dat that likely could integrate into jyve and be served from GitHub pages... Or dat itself.

Alice publishes the live state of her notebook to Bob by sending her public key, Bob subscribes, they find each other in the swarm and a naive stream of nbexplode files are passed around. If Bob then reverses the process (potentially through the in-lab chat), they can copy cells back and forth between the two notebooks.

Eve can discover a derivative of the public key (discovery key), and can therefore prove that A/B were talking about... Something... At some velocity and volume... But can't determine the value.

nothing comm-based works, yet, but other mime renderers work. It can't do multi-client editing, but I think that's just one gnarly webpack away with hypermerge (by the automerge folks, who blessedly work in ts).

Ok, ok, so it does need a static file server, and usually a websocket server: It needs a peer discovery mechanism (I ship one with jupyter-server-proxy) but once connected, everything happens over webrtc.

The Dat protocol is also good at really big files, though likely not in the browser. Sadly, however, the non-node/web clients are somewhat neglected, so you'd be stuck shelling out and working with the file system in most kernels. However, the node-based tooling can be webpacked (a la jlpm) down to under 2mb.

P2p stuff aside, which would be inappropriate in a number of situations, if a novel server must be implemented, the reference server requirement being on node/v8 is fine, so long as

The high road, though, would be something that compiled to wasm, but that's a whole other kettle of fish.

Even our current yarn/webpack (if we were more authoritarian on bundle discipline) doesn't have to be that bad, it's the end-user npm.org connectivity that remains my biggest issue. I think if we could get to yarn pnp, it could be reasonable, as that model would be pip/conda resolvable: instead of a giant node_modules tree is indeterminable depth, we'd just be filling a flat directory of tarballs. Pika is also interesting, but probably not ready for prime time. But I haven't explored these options.

Looking forward to the developments!

vidartf commented 4 years ago

@bollwyvl I can't really tell if you are recommending something to be used for RTC, or just doing a tangential discussion.

saulshanabrook commented 4 years ago

We had another chat about this when @afshin and @jasongrout came back.

Jason said that we could start out by just having it on the client side, and having the server side state management solution there as well, so we don't need node on the server. It won't actually give RTC, but can serve as a base that then we could switch to a server based version after we implement it all client side.

saulshanabrook commented 4 years ago

@bollwyvl The dat stuff is cool, I have seen this project implement CRDT on top of it https://github.com/automerge/hypermerge

Another idea is to make the RTC backend pluggable, so you could use different transport protocols or algorithms if you want.

choldgraf commented 4 years ago

@bollwyvl @saulshanabrook FWIW I was chatting with one of the dat folks a while back about RTC. They thought it sounded quite interesting. Would it be helpful to make a connection? It'd been a few months but maybe they'd still be interested

vidartf commented 4 years ago

Could someone explain what dat is, and what problems it will solve? I'm not keeping up on the trends (:

saulshanabrook commented 4 years ago

@choldgraf I think so, but let's wait a little till we have the RTC repo set up and a better idea of how to integrate a dat backend with all our existing work.

Not an expert, but dat lets you like sync data between hosts basically, over different protocols. It's a bit like torrents? or ipfs? https://dat.foundation/

saulshanabrook commented 4 years ago

Notes from chat with on Jupyter call with @ellisonbg @vidartf @vidartf @jasongrout and others:

saulshanabrook commented 4 years ago

I have created a new repo on this org for our RTC work: https://github.com/jupyterlab/rtc. Please create issues on that repo for all new RTC discussions. At some point, we might wanna move parts of it into other repos once the work hast stabilized.