vega / altair

Declarative visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.41k stars 796 forks source link

Run data transformer on other types of inputs #843

Open saulshanabrook opened 6 years ago

saulshanabrook commented 6 years ago

Right now, AFAIK, the data transformers are only run when a pandas dataframe is passed in: https://github.com/altair-viz/altair/blob/ee69e62c55847432178c2e52657b9cdcd98d8f43/altair/vegalite/v2/api.py#L26

However, I would like to be able to run a transformer when I pass in an ibis expression to a Chart. The transformer will take that ibis object and return a valid vega lite data dictionary.

The goal is for a user to easily compose visualizations from mapd expressions built with ibis, that are rendered with mapd's vega renderer.

If Altair called the data transformer when it gets a recognized class, then I could get this to work. Another solution would be to expose a registry where users could register special transformations associated with difference classes. Then _prepare_data could use this registry.

saulshanabrook commented 6 years ago

I am actually gonna close this for now, since we might not need this tight of an integration. We can just feed in our SQL query as a string, which Altair will think of as a URL, and then we can grab that out in a renderer, to send to the mapd backend.

jakevdp commented 6 years ago

OK – let me know if a use-case for this comes up and we can think about how to make the plugin more configurable.

saulshanabrook commented 6 years ago

I am reopening this because we would like to be able to pass in Ibis expressions to Chart and do a couple of different things. Either get some subset of the data into a pandas dataframe and plot that (data.limit(max_rows).execute()), which would work for most ibis backends, or keep the query as SQL and output it for the mapd backend to visualize.

I see a couple of options, starting with the simplest:

  1. Run data transformations on unknown classes (here).
  2. Change prepare data to be a single dispatch function, so that users can import it and register their data preparers.

I am happy to help implement either of these, or another idea you have.

jakevdp commented 6 years ago

I think the best option is to run data transformers on all data inputs. We'd have to modify default data transformers in Altair to raise a warning about unknown data types before passing the value through unchanged.

This would not change the API at all, and then you could simply register & enable a new transformer that would work for whichever data source you wish.

saulshanabrook commented 6 years ago

That sounds good. I can start working on a PR.

saulshanabrook commented 6 years ago

We'd have to modify default data transformers in Altair to raise a warning about unknown data types before passing the value through unchanged.

But then if you passed in a url string, for example, it would give you a bunch of warning by default.

jakevdp commented 6 years ago

I'd imagine that strings (assumed to be URLs) would be one of the "recognized" types.

ellisonbg commented 6 years ago

I can imagine that it would be helpful to run the data transformation on all data types. One way we might want to design that is using multiple dispatch:

https://github.com/mrocklin/multipledispatch

Then it becomes much easier to write different combinations of transformers that have different implementations for different data types. Much better than different data transformers each having a bunch if/case logic switching on the types. Thoughts?

saulshanabrook commented 6 years ago

That makes sense. Could we use Python's built in single dispatch? https://docs.python.org/3.6/library/functools.html#functools.singledispatch

ellisonbg commented 6 years ago

that may be enough On Sat, Aug 18, 2018 at 6:25 AM Saul Shanabrook notifications@github.com wrote:

That makes sense. Could we use Python's built in single dispatch? https://docs.python.org/3.6/library/functools.html#functools.singledispatch

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

-- Brian E. Granger Associate Professor of Physics and Data Science Cal Poly State University, San Luis Obispo @ellisonbg on Twitter and GitHub bgranger@calpoly.edu and ellisonbg@gmail.com