vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.25k stars 790 forks source link

Datashader & Altair? #1752

Open wmayner opened 4 years ago

wmayner commented 4 years ago

I love Altair. The API is beautiful, and in my view, far superior to alternatives like Holoviews. However, the Holoviews ecosystem has one killer feature: datashader. When working with millions of points, Altair is not yet an option, so I'm forced to use Holoviews.

In my ideal world, I could use datashader with Altair.

I figured I'd make this issue to register my dreams with the powers that be :)

jakevdp commented 4 years ago

Altair produces vega-lite chart specifications, which could in principle be rendered with Bokeh if someone wrote a vega-lite to bokeh translator. So far, there hasn't been anyone sufficiently motivated to tackle that. Maybe you're that person?

wmayner commented 4 years ago

Would it really require a full translator? In my head I was thinking of some workaround where datashader produces images which are then inserted into the chart specification. I don't know vega-lite well enough to know whether that makes sense, though. Unfortunately I don't think I would be able to spare the time to write a full translator. (I know it's not the most helpful to get feature requests from people who aren't willing/able to help implement them—sorry!)

jakevdp commented 4 years ago

I think a bokeh translator/renderer for vega-lite would be a far easier undertaking than trying to join datashader and vega at the javascript level. They're two entirely different frameworks.

jakevdp commented 4 years ago

See also https://github.com/vega/scalable-vega. The project is still young so there are no Python bindings yet, but it's a possible future solution to the large dataset issue.

wmayner commented 4 years ago

I was thinking of joining them at the Python level. I'm assuming vega-lite can render images; since datashader can produce images, then those images could be embedded in the specification. That would all be before the JS stage, as I'm imagining it.

jakevdp commented 4 years ago

Oh, OK. yeah, Vega-Lite can render images, for example: https://vega.github.io/vega-lite/docs/image.html

So you could use Bokeh/Datashader to generate an image, and then display that image in an Altair chart, if that's what you're after. But you're not going to be using the Altair grammar to create the image, and you're not going to have any interactive features short of maybe zooming-in on the image pixels.

wmayner commented 4 years ago

Right, it would have to be static. I think that's OK for some purposes. As for the grammar: it might be possible to use the spec as the the input to datashader, though, so Altair's grammar could still be used. That is, the function could take a normal vega-lite spec as input and return a spec where large data has been replaced with a datashaded image. If that's possible, then maybe this lower-barrier workaround would be useful, despite the lack of interactivity, until someone can build a proper translator.

jakevdp commented 4 years ago

it might be possible to use the spec as the the input to datashader

in other words, create a translator from vega-lite to bokeh/datashader 😀

jakevdp commented 4 years ago

Once that heavy-lifting has been done in terms of translating Vega-Lite into something that Bokeh/Datashader understands, I see very little benefit in sending a static rendered PNG back so Altair can use it. Why not just visualize it natively in DataShader?

joelostblom commented 3 years ago

@jakevdp Do you think it is worthwhile to add a brief section to the docs similar to what plotly has on datashader https://plotly.com/python/datashader/? Just to be explicit about how it can be used and maybe talk a bit about performance with large data in general with vega and its current limitations.

jakevdp commented 3 years ago

Sure - what do you have in mind to put in that section? I'm not aware of any Altair/Datashader interoperability similar to what plotly demonstrates there.

joelostblom commented 3 years ago

I was thinking along the lines of what you mentioned in your earlier comments in this thread: showing that the image arrays created from datashader can be displayed with Altair (which I think is mostly what that plotly page does). This would also add some functionality like tooltips, although there is no dynamic rescaling of the image as it is zooming in our out (but that is not on the plotly page either, only holoviews/bokeh has that I believe).

So for the second example on the plotly page:

image

I was thinking of something like this for Altair:

import altair as alt
import pandas as pd
import datashader as ds

df = pd.read_parquet(
    "https://raw.githubusercontent.com/plotly/datasets/master/2015_flights.parquet"
)
cvs = ds.Canvas(plot_width=70, plot_height=70)
agg = cvs.points(df, "SCHEDULED_DEPARTURE", "DEPARTURE_DELAY")
df_agg = agg.to_dataframe(name="COUNT").reset_index().dropna()

my_axis = alt.Axis(format=".1f", labelAngle=0, labelSeparation=20, labelOverlap=True)

chart = (
    alt.Chart(df_agg, height=250, width=430)
    .mark_rect(tooltip=True)
    .encode(
        alt.X("SCHEDULED_DEPARTURE:O", axis=my_axis),
        alt.Y("DEPARTURE_DELAY:O", axis=my_axis, scale=alt.Scale(reverse=True)),
        alt.Color("COUNT", scale=alt.Scale(type="log", scheme="plasma")),
    )
    .transform_filter(alt.datum.COUNT != 0)
)

chart
image

I haven't looked into the first example in details in case there is any specific plotly to datashader functionality that can't be done in altair. But I will look into it and make a PR in the next couple of weeks that you can review and say if you think it would be valuable to show how to plot datashader array although there is no specific interoperatabiilty between the packager.