Closed saulshanabrook closed 4 years ago
Regarding moving the work to a new repository, I agree with the intent and proposed structure, but want to mention that it might have been easier if lumino
was registered as an org (under the jupyter umbrella similar to jupyterlab, jupyter-widgets, and jupyterhub orgs). Not sure how feasible this is. So other than that we now inline-namespace org names instead of having them be orgs, I agree with the names.
having it as a seperate repo [... allows] us to more freely include third party dependencies
I'm not sure why we want to include third party dependencies. One of the clear strengths of lumino is its non-exposure to leftpad. I'm also not sure why changing the repo should change the philosophy w.r.t. this.
Speaking of the client/server setup, I would argue strongly for keeping any and all Python code out of the lumino repos. I would also argue strongly for not requiring Node to run the jupyterlab server. We can discuss this to great lengths in a separate thread though.
Final note: For attracting more contributors, I think the main barrier is access to good documentation (beyond just API docs, e.g. examples on use, tutorials, architecture overview, documenting how our variant of the CRDT algorithm works). While structuring things separately might gives some advantages, I would desperately prioritize time on writing docs and examples. Such efforts also tend to highlight any pain points in the API, so it would be good to start this sooner than later.
I'm not sure why we want to include third party dependencies. One of the clear strengths of lumino is its non-exposure to leftpad. I'm also not sure why changing the repo should change the philosophy w.r.t. this.
For example, if we add integration of the datastore with react components or RXJS observables, then these become dependencies.
Speaking of the client/server setup, I would argue strongly for keeping any and all Python code out of the lumino repos. I would also argue strongly for not requiring Node to run the jupyterlab server. We can discuss this to great lengths in a separate thread though.
So maybe we call it not lumino-datastore
then but my-fun-RTc-clientside-data-thing-name
and it depends on @lumino/datastore
which still lives in lumino
.
I think it would be nice for new users, coming to whatever the repo is, to be able to use the tools to build their own web app that is RTC. And to do that, they need some sort of server which handles relaying patches.
Notes from meeting with @blink1073 and @vidartf
jupyterlab/rtc-incubator
monorepo where we can put all of these partsExciting!
I may have to (somewhat jokingly) take exception to a websocket server being a hard requirement in the first place. Over Thanksgiving, this hacked itself together:
https://github.com/deathbeds/jupyterlab-dat
Obviously very wip, but yeah, it pretty much does the thing: a reasonably usable notebook pub/multisub and ephemeral chat built on dat that likely could integrate into jyve and be served from GitHub pages... Or dat itself.
Alice publishes the live state of her notebook to Bob by sending her public key, Bob subscribes, they find each other in the swarm and a naive stream of nbexplode files are passed around. If Bob then reverses the process (potentially through the in-lab chat), they can copy cells back and forth between the two notebooks.
Eve can discover a derivative of the public key (discovery key), and can therefore prove that A/B were talking about... Something... At some velocity and volume... But can't determine the value.
nothing comm-based works, yet, but other mime renderers work. It can't do multi-client editing, but I think that's just one gnarly webpack away with hypermerge (by the automerge folks, who blessedly work in ts).
Ok, ok, so it does need a static file server, and usually a websocket server: It needs a peer discovery mechanism (I ship one with jupyter-server-proxy) but once connected, everything happens over webrtc.
The Dat protocol is also good at really big files, though likely not in the browser. Sadly, however, the non-node/web clients are somewhat neglected, so you'd be stuck shelling out and working with the file system in most kernels. However, the node-based tooling can be webpacked (a la jlpm) down to under 2mb.
P2p stuff aside, which would be inappropriate in a number of situations, if a novel server must be implemented, the reference server requirement being on node/v8 is fine, so long as
it can also be distributed as a single script
the specification is machine-readable enough that it can be implemented and tested for conformance in another language.
The high road, though, would be something that compiled to wasm, but that's a whole other kettle of fish.
Even our current yarn/webpack (if we were more authoritarian on bundle discipline) doesn't have to be that bad, it's the end-user npm.org connectivity that remains my biggest issue. I think if we could get to yarn pnp, it could be reasonable, as that model would be pip/conda resolvable: instead of a giant node_modules tree is indeterminable depth, we'd just be filling a flat directory of tarballs. Pika is also interesting, but probably not ready for prime time. But I haven't explored these options.
Looking forward to the developments!
@bollwyvl I can't really tell if you are recommending something to be used for RTC, or just doing a tangential discussion.
We had another chat about this when @afshin and @jasongrout came back.
Jason said that we could start out by just having it on the client side, and having the server side state management solution there as well, so we don't need node on the server. It won't actually give RTC, but can serve as a base that then we could switch to a server based version after we implement it all client side.
@bollwyvl The dat stuff is cool, I have seen this project implement CRDT on top of it https://github.com/automerge/hypermerge
Another idea is to make the RTC backend pluggable, so you could use different transport protocols or algorithms if you want.
@bollwyvl @saulshanabrook FWIW I was chatting with one of the dat folks a while back about RTC. They thought it sounded quite interesting. Would it be helpful to make a connection? It'd been a few months but maybe they'd still be interested
Could someone explain what dat is, and what problems it will solve? I'm not keeping up on the trends (:
@choldgraf I think so, but let's wait a little till we have the RTC repo set up and a better idea of how to integrate a dat backend with all our existing work.
Not an expert, but dat lets you like sync data between hosts basically, over different protocols. It's a bit like torrents? or ipfs? https://dat.foundation/
Notes from chat with on Jupyter call with @ellisonbg @vidartf @vidartf @jasongrout and others:
I have created a new repo on this org for our RTC work: https://github.com/jupyterlab/rtc. Please create issues on that repo for all new RTC discussions. At some point, we might wanna move parts of it into other repos once the work hast stabilized.
Over the past few years, many folks have been working on bringing real time collaboration to JupyterLab. It would support new features like:
The current work is in the datastore package in Lumino and in a PR to JupyterLab (https://github.com/jupyterlab/jupyterlab/pull/6871).
Moving forward, we could could move this work out into two new separate repositories:
lumino-datastore
: General purpose frontend library to support real time collaboration. Either depends on lumino datastore or we move it into this repo. It does not depend on anything Jupyter related. We would move Vidar's existing server side patch proxy work to this repo, from the JupyterLab PR. We would also move Ian's work on datastore helpers to this repo. The idea is that any web app could use this package to add a real time synchronized datastore to their app. Prior art: https://github.com/redux-orm/redux-orm https://github.com/grrowl/redux-scuttlebutt https://github.com/automerge/automerge https://github.com/DevResults/cevitxe. We could put this Lumino itself, but having it as a seperate repo can make it better from an external marketing perspective as well as allowing us to more freely include third party dependencies. Marketing wise, the idea is that, like those other projects, users can come to the Github repo and understand the sole use of the project.jupyter-datastore
: The Jupyter Datastore package gives you an up to date data model of the Jupyter Server data structures in your browser. We would move the other work Ian has been doing from the JupyterLab PR into this repo. It could also add a server side component that talks directly to the server. This would help us achieve the last goal of saving outputs to a notebook even if a client is not open. It is meant to be a building block for any Jupyter web UIs.Here is a drawing I put together to try to show how these different pieces could work together:
Zach and I also started to sketch out the start of the jupyter-datastore APIs, including the REST api and the tables
Adding these new repos has some advantages:
However, it will come at the expense of more maintenance burden, having to set up our own build and testing infrastructure for each repo. And it might be potentially confusing, if folks are not sure what the scope is of the different repos. Its also harder to create cross-repo changes, because it requires coordinating pull requests.
I propose creating these two new repos on the JupyterLab organization and create issues and milestones to track what needs to be done on each. Before that can be done, we have to come up with names for each. Current are this, but we could change them:
lumino-datastore
jupyter-datatsore
cc @vidartf @ellisonbg @afshin @Zsailer
Does anyone have objections or name ideas?