Open afshin opened 2 years ago
@afshin Should we just add this to the server or need a new server extension package? Is anyone assigned to this task?
@3coins, this should be in the core server.
Currently, no one is specifically assigned. I'd like to see the user interface portion of this landing in JupyterLab and I am happy to work on any part of the stack that helps get us there.
I think that the work on this already done in the telemetry space might be farther along than the server extension from the notifications extension, so grafting those handlers into jupyter-server
might be the best way of bringing this into core.
What are you thinking? Let's have a conversation about this with all the people who have interest and bandwidth to work on it.
@afshin
What are you thinking? Let's have a conversation about this with all the people who have interest and bandwidth to work on it.
Agree, let me know if you want to have an offline discussion including anyone else who wants to work on this; personally, I would like to get some experience on the server side, but happy to work on any part of the stack. Is there an expected time frame to get these changes done?
I think Zach is rounding up interested folks (including you) for a conversation.
We are targeting jupyter-server
v2 and jupyterlab
v4 (so late June, early July).
@afshin / @Zsailer : please include me on any meeting that might happen related to this. I am interested to learn more on this area and contribute anyway I can.
As discussed in the server meeting on 5/5/2022, here is an initial list of tasks for the event notification system. This list is by no means final, feel free to add comments or feedback.
Event Bus - #820
/api/events/subscribe
- Websocket for subscribing to eventsRest API Endpoints
/api/events
- Rest api to create new events/api/events/schemas
- Rest api to query/list registered schemas (Optional)Event buffer
JupyterLab 4 Event Client (jupyterlab-events)
Add Default events in server
JupyterLab 4 Updates
jupyterlab-events
as dependency inside JupyterLabEvent Notification UI (JupyterLab)
Jupyter Notebook 7 Updates
jupyterlab-events
as dependencyHere is a document we can collaboratively edit so that the front-matter of this issue can have a canonical version that we edit once it is ready: https://hackmd.io/q4Rkq2BaS1SIXvyzt8j1yA
Since the event system is a new service that we are just starting to develop, how about making it as much as possible backend-agnostic? By that I mean that most of the logic should be usable in both jupyter-server and jupyverse. But it is currently very tied to jupyter-server, Tornado and traitlets, which we don't want to depend on in jupyverse.
Thanks for bringing this up, @davidbrochart! I think we're going to see this question/conversation come up multiple times moving forward as we continue pushing Jupyter Server forward, while trying to bring jupyverse to the front.
Let me start by saying—technically, the event system is backend agnostic. We just defined a REST + websocket API for posting/subscribing to events. These are schema/protocol driven. Jupyverse can/should create an implementation of this API. I don't think there is anything tied specifically to Tornado here. Any server implementation will always have to write some server-library-specific code to make it work. Consider, if we started this in jupyverse, how would we port it to jupyter_server? We would have to re-implement the handlers in Tornado and drop the FastAPI specific logic.
That said, under the hood, we depend on jupyter_telemetry (hopefully, switching to jupyter_events
soon) and you are correct—jupyter_telemetry/events depends on traitlets.
That's because we needed the Event System API to be configurable. I don't see a way around using traitlets
for this without switching to some other backwards compatible, backend-agnostic, config-based library. For example, it looks to me that jupyverse/FPS is implementing its own (non-backend agnostic) configuration system, fps.config
. While I believe FPS offers a much cleaner way to handle config, it's not backwards compatible with Jupyter Server. This might be a place we can improve.
Unfortunately, at this time, I don't see a single solution that would work for both. And while I see jupyverse as our future (it's awesome!), I don't think we should block jupyter_server from making advancements using the older dependencies at this point in time.
Do you have ideas how to reconcile this?
You're right Zach, jupyverse also has implemented specific logic for configuration, and I guess depending on FastAPI makes it kind of specific to this framework too. I'm thinking about some low-level logic (functions, classes...) that would be called from either a Tornado handler or a FastAPI router, with all configuration already resolved at this point, and passed as generic arguments.
"backwards compatible, backend-agnostic, config-based library"
To me, this is the "holy grail".
We could probably get pretty close by
BaseModel
. I meant something more simple, like this GET handler calls this get method. If we can have the logic in the get method in a separate package, that's a great step towards backend agnosticism.
Is an event intended to notify the user visually? If so, will we distinguish between read and unread notifications, high-priority and low-priority, notifications, etc.? I'm also curious about whether notifications might be transmitted via other means, such as e-mail or SMS.
@jweill-aws the "case study" above is about notifications and the idea is that it becomes an extension's job to manage its state. In the case of notifications, the extension will write events it cares about from the event bus into a SQL database and it will be the job of the client to call DELETE
to remove those items from the database (i.e., make them "read").
In https://github.com/jupyter/jupyter_events/pull/2, we've have been discussing the handling of sensitive data in the event system. I'm confident that these are already "solved problems" in other systems, so I need some help gathering information about how to properly do it here.
In https://github.com/jupyter/jupyter_events/pull/2, I added a required field to every schema, "redactionPolices", that is used to describe the sensitivity of every event property. The event logger can be configured to redact sensitive policies from all data in all events. This data is redacted before the event is ever emitted. This provides a simple way to ensure that sensitive data is never persisted.
On the other hand, if a client (e.g. JupyterLab) builds features that depend on the event system, and these features depend on receiving all of the data, redacted events/data breaks these features. This makes the event system unusable to these features when launching in a data-conscious (i.e. most) environments.
To make the event system useful, we need to a secure way to handle sensitive data in transit, specifically when moving between Jupyter Server and its clients. Today, the event bus added in #820 shuttles raw events to the client across the websocket. Any authenticated websocket client can connect to this websocket and "see" all event data—this obviously isn't a secure approach.
This is where I need some help. What are some known patterns for handling sensitive data in transit from server to client? If we encrypt the data in the server, how do we secure decrypt it in something like JupyterLab?
The basic "plumbing" for Jupyter server's event system landed here: https://github.com/jupyter-server/jupyter_server/pull/862
We've started logging some events from the contents here: https://github.com/jupyter-server/jupyter_server/pull/954
This is a draft document, please feel free to comment and help.
We have (at least) two concurrent efforts that overlap but are not full implementations of a generic event system for Jupyter in themselves:
jupyter-telemetry
andjupyterlab-notifications
.A synthesis of these extensions with generic endpoints (i.e., not specifically designed and named for telemetry or notifications) would yield a flexible general-purpose event bus for
jupyter-server
-based applications.cc: @andrii-i @3coins
Architecture of
events
APIREST Endpoints
POST
/api/events
- create new eventsGET
/api/events/schemas
- query/list registered schemas (maybe -- needs discussion)POST
/api/events/schemas
- register schemas (maybe -- needs discussion)WebSocket endpoints (
WebsocketHandler
)/api/events/subscribe
- fire hose of all events -- perhaps accept filters? (see open question below)/api/events/subscribe/notification
-- subscribe to events of typenotification
Open Question: Should the WebSocket handler support making a request for multiple filters to be applied instead of just the one proposed in the URL scheme above?
Depends on
jupyter_events
packageEventLogger
object in (formerlyEventLog
injupyter_telemetry
)Case Study: JupyterLab Notifications
Server-side functionality
notification
events that pass through the event busnotification
as a row in a SQLite database on the server with a key for the recipient identity as well as an IDnotification
events with multiple recipients can be de-normalized here and written as multiple rowsREST API
GET
/api/notifications
- retrieve a list of all notifications that authenticated user can seeGET
/api/notifications/{ID}
- retrieve a specific notificationDELETE
/api/notifications/{ID}
- delete a specific notificationClient-side functionality
/api/events/notifications
WebSocketThrottle
its incoming messages at some reasonable rate (on the order of 0.5-1 seconds)events
API as a notifier only -- check the/api/notifications
endpoint for the actual list of messagesJupyterLab 4 extension
Token
(e.g.,INotifications
orIEvents
) that exposes anIDataConnector
for event CRUD and anISignal
for event subscriptionJupyter Notebook 7 extension
Token
from the JupyterLab extension