edublancas commented 1 year ago

we recently introduced a plotting feature that allows creating histograms and boxplots for large datasets by pushing the data aggregation step to the SQL engine and only passing the aggregated data to the frontend.

however, the plots are static: the commands return an image with no interactivity. This issue discusses how we can add interactivity by connecting the frontend (the Jupyter UI) with the kernel (the process running the Python process): we want something like this. in the video, user's actions trigger computations in the Python process which then returns values that update the plot in the frontend.

I'm sharing resources I've found; however, this is still a work in progress, so we have to find the best architecture.

communication

Jupyter offers a comms feature to communicate the frontend with the backend. for an implementation example, see the ipywidgets library

proof-of-concept

as a proof of concept, we should implement a simple end-to-end workflow. I'm thinking we can display a button that whenever pressed, it triggers the execution of a Python function whose value is then displayed in the frontend.

plotting

once we have the proof-of-concept, we can continue with the first interactive plot; sharing some details.

frontend

We can use Vega/Altair for this. Altair is a simplified Python API built on top of Vega. I'm unsure what's the right choice for this. I'm assuming Vega since that's it's the JS implementation that we want to display in the frontend.

we can check out vegafusion, a similar project:

https://github.com/hex-inc/vegafusion https://medium.com/@jonmmease/announcing-vegafusion-570f62207ba7

backend

the current plotting feature builds SQL features as strings and then executes them in the database; this is a simple solution but it has some limitations because the generated queries might not work in specific databases (see https://github.com/ploomber/jupysql/issues/86). An alternative approach is to rely on sqlalchemy (which we already use for communicating with the db), which has a Python API that allows us to generate valid SQL for the connected db. we can start with hardcoded SQL for a proof-of-concept and then move to sqlalchemy.

edublancas commented 1 year ago

found another interesting project: https://github.com/Kanaries/pygwalker

based on the HN discussion, looks like this is computing the data aggregation step in the frontend, although there's a PR to use DuckDB with WASM to scale it up.

the heavy lifting is happening here: https://github.com/Kanaries/graphic-walker

edublancas commented 1 year ago

another interesting project: https://github.com/manzt/anywidget

ploomber / jupysql

communicating jupyter frontend with kernel #136

communication

proof-of-concept

plotting

frontend

backend