jupyter-server / jupyter_server

The backend—i.e. core services, APIs, and REST endpoints—to Jupyter web applications.
https://jupyter-server.readthedocs.io
BSD 3-Clause "New" or "Revised" License
492 stars 308 forks source link

An event system for Jupyter #780

Open afshin opened 2 years ago

afshin commented 2 years ago

This is a draft document, please feel free to comment and help.

We have (at least) two concurrent efforts that overlap but are not full implementations of a generic event system for Jupyter in themselves: jupyter-telemetry and jupyterlab-notifications.

A synthesis of these extensions with generic endpoints (i.e., not specifically designed and named for telemetry or notifications) would yield a flexible general-purpose event bus for jupyter-server-based applications.

cc: @andrii-i @3coins

Architecture of events API

REST Endpoints

WebSocket endpoints (WebsocketHandler)

Open Question: Should the WebSocket handler support making a request for multiple filters to be applied instead of just the one proposed in the URL scheme above?

Depends on jupyter_events package


Case Study: JupyterLab Notifications

Server-side functionality

Client-side functionality

Jupyter Notebook 7 extension

3coins commented 2 years ago

@afshin Should we just add this to the server or need a new server extension package? Is anyone assigned to this task?

afshin commented 2 years ago

@3coins, this should be in the core server.

Currently, no one is specifically assigned. I'd like to see the user interface portion of this landing in JupyterLab and I am happy to work on any part of the stack that helps get us there.

I think that the work on this already done in the telemetry space might be farther along than the server extension from the notifications extension, so grafting those handlers into jupyter-server might be the best way of bringing this into core.

What are you thinking? Let's have a conversation about this with all the people who have interest and bandwidth to work on it.

3coins commented 2 years ago

@afshin

What are you thinking? Let's have a conversation about this with all the people who have interest and bandwidth to work on it.

Agree, let me know if you want to have an offline discussion including anyone else who wants to work on this; personally, I would like to get some experience on the server side, but happy to work on any part of the stack. Is there an expected time frame to get these changes done?

afshin commented 2 years ago

I think Zach is rounding up interested folks (including you) for a conversation.

We are targeting jupyter-server v2 and jupyterlab v4 (so late June, early July).

rahul26goyal commented 2 years ago

@afshin / @Zsailer : please include me on any meeting that might happen related to this. I am interested to learn more on this area and contribute anyway I can.

3coins commented 2 years ago

As discussed in the server meeting on 5/5/2022, here is an initial list of tasks for the event notification system. This list is by no means final, feel free to add comments or feedback.

  1. Event Bus - #820

    • A central event bus to relay events
    • /api/events/subscribe - Websocket for subscribing to events
    • A default handler for consuming events
  2. Rest API Endpoints

    • POST /api/events - Rest api to create new events
    • GET /api/events/schemas - Rest api to query/list registered schemas (Optional)
  3. Event buffer

    • A queue/buffer to store undelivered event messages
  4. JupyterLab 4 Event Client (jupyterlab-events)

    • Reuse jupyterlab-telemetry repo, either rename or copy to jupyterlab-events
    • Remove server endpoints, any redundant server code
    • Update client handlers to use the rest api endpoints
    • Add websocket handler to enable subscribtion to events
  5. Add Default events in server

    • Add default events e.g., content handler, kernel events in jupyter server
  6. JupyterLab 4 Updates

    • Add jupyterlab-events as dependency inside JupyterLab
    • Subscribe to default events
  7. Event Notification UI (JupyterLab)

    • UI updates for event notification
  8. Jupyter Notebook 7 Updates

    • Add jupyterlab-events as dependency
    • Subscribe to default events
    • Can we reuse event notification UI from JupyterLab?
afshin commented 2 years ago

Here is a document we can collaboratively edit so that the front-matter of this issue can have a canonical version that we edit once it is ready: https://hackmd.io/q4Rkq2BaS1SIXvyzt8j1yA

davidbrochart commented 2 years ago

Since the event system is a new service that we are just starting to develop, how about making it as much as possible backend-agnostic? By that I mean that most of the logic should be usable in both jupyter-server and jupyverse. But it is currently very tied to jupyter-server, Tornado and traitlets, which we don't want to depend on in jupyverse.

Zsailer commented 2 years ago

Thanks for bringing this up, @davidbrochart! I think we're going to see this question/conversation come up multiple times moving forward as we continue pushing Jupyter Server forward, while trying to bring jupyverse to the front.

Let me start by saying—technically, the event system is backend agnostic. We just defined a REST + websocket API for posting/subscribing to events. These are schema/protocol driven. Jupyverse can/should create an implementation of this API. I don't think there is anything tied specifically to Tornado here. Any server implementation will always have to write some server-library-specific code to make it work. Consider, if we started this in jupyverse, how would we port it to jupyter_server? We would have to re-implement the handlers in Tornado and drop the FastAPI specific logic.

That said, under the hood, we depend on jupyter_telemetry (hopefully, switching to jupyter_events soon) and you are correct—jupyter_telemetry/events depends on traitlets.

That's because we needed the Event System API to be configurable. I don't see a way around using traitlets for this without switching to some other backwards compatible, backend-agnostic, config-based library. For example, it looks to me that jupyverse/FPS is implementing its own (non-backend agnostic) configuration system, fps.config. While I believe FPS offers a much cleaner way to handle config, it's not backwards compatible with Jupyter Server. This might be a place we can improve.

Unfortunately, at this time, I don't see a single solution that would work for both. And while I see jupyverse as our future (it's awesome!), I don't think we should block jupyter_server from making advancements using the older dependencies at this point in time.

Do you have ideas how to reconcile this?

davidbrochart commented 2 years ago

You're right Zach, jupyverse also has implemented specific logic for configuration, and I guess depending on FastAPI makes it kind of specific to this framework too. I'm thinking about some low-level logic (functions, classes...) that would be called from either a Tornado handler or a FastAPI router, with all configuration already resolved at this point, and passed as generic arguments.

Zsailer commented 2 years ago

"backwards compatible, backend-agnostic, config-based library"

To me, this is the "holy grail".

We could probably get pretty close by

  1. writing logic that translates traitlets config into a pydantic BaseModel.
  2. handling traits/fields that "observe" other traits/fields.
davidbrochart commented 2 years ago

I meant something more simple, like this GET handler calls this get method. If we can have the logic in the get method in a separate package, that's a great step towards backend agnosticism.

JasonWeill commented 2 years ago

Is an event intended to notify the user visually? If so, will we distinguish between read and unread notifications, high-priority and low-priority, notifications, etc.? I'm also curious about whether notifications might be transmitted via other means, such as e-mail or SMS.

afshin commented 2 years ago

@jweill-aws the "case study" above is about notifications and the idea is that it becomes an extension's job to manage its state. In the case of notifications, the extension will write events it cares about from the event bus into a SQL database and it will be the job of the client to call DELETE to remove those items from the database (i.e., make them "read").

Zsailer commented 2 years ago

In https://github.com/jupyter/jupyter_events/pull/2, we've have been discussing the handling of sensitive data in the event system. I'm confident that these are already "solved problems" in other systems, so I need some help gathering information about how to properly do it here.

In https://github.com/jupyter/jupyter_events/pull/2, I added a required field to every schema, "redactionPolices", that is used to describe the sensitivity of every event property. The event logger can be configured to redact sensitive policies from all data in all events. This data is redacted before the event is ever emitted. This provides a simple way to ensure that sensitive data is never persisted.

On the other hand, if a client (e.g. JupyterLab) builds features that depend on the event system, and these features depend on receiving all of the data, redacted events/data breaks these features. This makes the event system unusable to these features when launching in a data-conscious (i.e. most) environments.

To make the event system useful, we need to a secure way to handle sensitive data in transit, specifically when moving between Jupyter Server and its clients. Today, the event bus added in #820 shuttles raw events to the client across the websocket. Any authenticated websocket client can connect to this websocket and "see" all event data—this obviously isn't a secure approach.

This is where I need some help. What are some known patterns for handling sensitive data in transit from server to client? If we encrypt the data in the server, how do we secure decrypt it in something like JupyterLab?

Zsailer commented 2 years ago

The basic "plumbing" for Jupyter server's event system landed here: https://github.com/jupyter-server/jupyter_server/pull/862

We've started logging some events from the contents here: https://github.com/jupyter-server/jupyter_server/pull/954