posit-dev / py-shiny

Shiny for Python
https://shiny.posit.co/py/
MIT License
1.32k stars 81 forks source link

Using altair with a large dataset #1748

Closed cddesja-fda closed 3 weeks ago

cddesja-fda commented 4 weeks ago

In order to use altair for a dataset where the number of rows exceeds 5000, one needs to enable the VegaFusion data transformer. For example, creating the following simple Shiny app:

import shiny.express
from shinywidgets import render_altair
import altair as alt
import numpy as np
import pandas as pd

# Generate bivariate normal distribution
mean = [0, 0]
cov = [[1, 0.5], [0.5, 1]]
data = np.random.multivariate_normal(mean, cov, 10000)
df = pd.DataFrame(data, columns=['x', 'y'])

@render_altair
def scatterplot():
    return(
        alt.Chart(df).mark_circle(size=60, color='#b6377a').encode(
        x='x',
        y='y')
    )

raise this error:

The number of rows in your dataset is greater than the maximum allowed (5000).

Try enabling the VegaFusion data transformer which raises this limit by pre-evaluating data
transformations in Python.
    >> import altair as alt
    >> alt.data_transformers.enable("vegafusion")

Or, see https://altair-viz.github.io/user_guide/large_datasets.html for additional information
on how to plot large datasets.

The software recommends adding the following alt.data_transformers.enable("vegafusion"). Modifying the Shiny app:

import shiny.express
from shinywidgets import render_altair
import altair as alt
import numpy as np
import pandas as pd
alt.data_transformers.enable("vegafusion")

# Generate bivariate normal distribution
mean = [0, 0]
cov = [[1, 0.5], [0.5, 1]]
data = np.random.multivariate_normal(mean, cov, 10000)
df = pd.DataFrame(data, columns=['x', 'y'])

@render_altair
def scatterplot():
    return(
        alt.Chart(df).mark_circle(size=60, color='#b6377a').encode(
        x='x',
        y='y')
    )

Which when run results in the following error:

TypeError(
TypeError: Invalid tag item type: <class 'altair.utils.plugin_registry.PluginEnabler'>. Consider calling str() on this value before treating it as a tag item.

Is this bug with Shiny, altair, and/or is there a workaround?

cddesja-fda commented 4 weeks ago

It looks like literally doing what the message says works. That is changing the following line from

alt.data_transformers.enable("vegafusion")

to

str(alt.data_transformers.enable("vegafusion"))

Now I am getting this result:

Image

Is there a way to silence this DataTransformer message or a different workaround?

cpsievert commented 4 weeks ago

Glad you got vegafusion working! And, yes, change:

alt.data_transformers.enable("vegafusion")

to

_ = alt.data_transformers.enable("vegafusion")

and that error/message will go away.

Generally, sometimes you'll need to do that with Express since it tries to display most/all Python objects (like a notebook would), but is also fairly strict about what objects can be displayed

cddesja-fda commented 4 weeks ago

Thanks. Looks like, as you noted, this is not an issue for Core.

import matplotlib.pyplot as plt
import numpy as np
from shinywidgets import render_widget, output_widget, render_plotly
from shinywidgets import render_altair, output_widget
import altair as alt
import pandas as pd
from shiny import App, Inputs, Outputs, Session, render, ui
alt.data_transformers.enable("vegafusion")

app_ui = ui.page_fluid(
    output_widget("scatterplot")
)

def server(input: Inputs, output: Outputs, session: Session):

    @render_altair
    def scatterplot():
        mean = [0, 0]
        cov = [[1, 0.5], [0.5, 1]]
        data = np.random.multivariate_normal(mean, cov, 10000)
        df = pd.DataFrame(data, columns=['x', 'y'])

        fig = alt.Chart(df).mark_circle(size=60, color='#b6377a').encode(
            x='x',
            y='y')

        return fig

app = App(app_ui, server)

Image

Thanks for the help. If this isn't a bug with Core, then feel free to close the issue. This is resolved for me.