holoviz / panel

Panel: The powerful data exploration & web app framework for Python
https://panel.holoviz.org
BSD 3-Clause "New" or "Revised" License
4.76k stars 518 forks source link

Enable Perspective to support large tables #7359

Open MarcSkovMadsen opened 1 month ago

MarcSkovMadsen commented 1 month ago

I work with lots of tabular datasets of the size 50-500MB. For exploratory data analysis the Perspective Viewer is really powerful and unique. Unfortunately sending the full datasets to the client is slow and often breaks of a websocket max limitation imposed by Panel, JupyterHub or Kubernetes. You can increase these limits but only to some extend and also this can be outside the control or capability of a data scientist.

I'm increasingly seeing this problem and I'm not the only one seeing this problem (Discourse #6804). Its actually a problem that is very common in Finance and Trading where I work. Currently Excel support larger tables than we do with Perspective. I believe we should enable users to work with larger files than Excel can in Perspective.

Actually Perspective was built to support large tabular data via virtualization. See regular-table and Perspective. But our implementation only use the perspective-viewer web component. Not the advanced client-server virtualization architecture supported.

A Panel user actually showcased how to use the client-server virtualization in Discourse #6430. But its only a complicated to use proof of concept.

Please note that the client-server virtualization architecture seems similar to Mosaic - Mosaic is just built on DuckDB. There is a request to add Mosaic in FR #7358.

Discussion

The Tabulator Pane provides a kind of virtualization via the pagination parameter ("local" or "remote"). We could support a similar parameter with Perspective making it really easy for users. On the other hand there is power in exposing more of the underlying Perspective api like the PerspectiveManager and hosting tables once but using across sessions and users. I think also Panel Perspective pane would be more useful if it implemented the Jupyter Perspective Widget api and capabilities. See PyCon Italy 2024 and PerspectiveWidget Implementation for inspiration.

Today Panel can be running on both Tornado and FastAPI servers. The solution should work in both environments. Personally I want to migrate to FastAPI deployments if that is possible.

Also it should just work in Pyodide because that is where lots of the showcasing of the functionality will take place.

Cannot use JupyterWidget

Unfortunately its not a workaround to use the Jupyter Widget

import pandas as pd
import panel as pn
from perspective.widget import PerspectiveWidget

pn.extension("ipywidgets")

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=["A", "B", "C"])

p = PerspectiveWidget(df)

pn.pane.IPyWidget(p).servable()

image

texodus commented 1 month ago

Actually Perspective was built to support large tabular data via virtualization.

As you note, Perspective already supports this mode, and PerspectiveWidget already defaults to "server" mode and has a kwarg (binding_mode) for switching to server-client replicated mode - not sure what you are requesting exactly?

Today Panel can be running on both Tornado and FastAPI servers. The solution should work in both environments. Personally I want to migrate to FastAPI deployments if that is possible. Also it should just work in Pyodide because that is where lots of the showcasing of the functionality will take place.

Perspective already supports all of these environments - we even publish pyodide wheels.

Unfortunately its not a workaround to use the Jupyter Widget

Please use the Issue template we provide and provide a repro. This screenshot does not tell me anything about the nature of the error (except that it is obviously running in some context that wraps exceptions).

EDIT I misread the repo I was commenting on :) - the point remains, I can't make heads nor tails of this screenshot without a Perspective repro.

As I said above, PerspectiveWidget already defaults to "server" mode.

MarcSkovMadsen commented 1 month ago

Hi @texodus

Thx.

As I read your reply you are thinking about the Perspective Jupyter Widget. This does not work with Panel.

We have our own Panel Perspective widget which only uses the perspective-viewer web component. Not anything else (implementation). Thus all data is transferred to client.

This request is to create a Panel Perspective widget which is as efficient as the Jupyter Perspective widget and can scale to large tables. The user code should be the same and just work across all environments where Panel can run (Tornado, FastAPI, pyodide, PY.CAFE).

MarcSkovMadsen commented 1 month ago

I've started making the architectural problems more specific via code examples in https://github.com/holoviz/panel/pull/7368 @texodus.