holoviz / hvplot

A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews
https://hvplot.holoviz.org
BSD 3-Clause "New" or "Revised" License
1.09k stars 105 forks source link

hvPlot as the universal entry point to HoloViz tools #533

Open jbednar opened 3 years ago

jbednar commented 3 years ago

HoloViz.org shows how all the various HoloViz tools (Panel, hvPlot, HoloViews, GeoViews, Datashader, Param, Colorcet, etc.) fit together, forming a coherent suite of complementary tools that add up to solve a very wide range of problems. However, it can be very difficult for new users to make sense of this ecosystem, not knowing whether to start with hvPlot, HoloViews/GeoViews, Panel, or Datashader, each of which can be used on their own for some things but which also make sense together. Many users end up either giving up and choosing a less-powerful but more approachable alternative outside of HoloViz, or they end up not choosing the right tool for the job, or they struggle longer than necessary to make sense of things before they can start becoming productive.

HoloViz.org was introduced as a way to solve this problem, with a tutorial introducing the various tools and telling people how and when to use each one. But it's still difficult, because each of the tools individually tells one story about what it's for and how to use it, and these stories each differ from the one at holoviz.org. Probably not that many people actually make it through all the material, and probably even fewer really retain what it is saying enough to be sure they are using the right tool for the job, and which subset of that tool's docs they need to focus on. Plus, we have dozens of separate examples scattered around each using some "best practice approach of year XXXX", adding to the information overload. Of course, we could go through and update our docs with the latest and best advice for each example, but the improvements have mostly been incremental, so the motivation to do all that work has been low. Even our best-practice advice has still been difficult to communicate and get a handle on, i.e. to use hvPlot to construct simple layouts, but then learn about Panel as soon as you need to control a widget, learn about HoloViews as soon as you need a stream or opts, and so on.

I think this situation has recently changed dramatically, and that we now have an opportunity to rally around a consistent and far simpler story for what to tell our users to do so that they are able to have the maximum power for the minimum investment (i.e., minimal learning of new APIs and concepts). Specifically, the recent introduction of the .interactive() API in hvPlot makes it possible for the first time to make very nearly all of the power that is available in HoloViz usable from hvPlot, using a simple, clean, and easy to explore API that users are already learning anyway (e.g. the Pandas or xarray APIs). hvPlot already provided access to much of the power of HoloViews, GeoViews, and Datashader, but Panel separately had an equally good claim to be a good starting point for using HoloViz, so we had to tell people "use Panel if you want to make apps and dashboards, and/or use hvPlot if you want to make plots". Now, it's finally plausible to say to start with hvPlot even if they want to make dashboards.

Capitalizing on this new situation, I propose that we use hvPlot as our single entry point to the HoloViz ecosystem, with each of our tools' docs loudly recommending that users approach via hvPlot, not that tool directly, and recommending that users only dive into the individual tools when certain clearly state-able conditions are met. I think if we can achieve a good experience there, we will make a very large fraction of the power supported by these tools available without users having to know or master:

That way users who invest even a little effort in learning hvPlot will get a big payoff, those who invest just a little bit more will get even more, and if they stop there they can still do at least a large fraction of what HoloViz tools can do, without ever having to dive deeper.

Most of what it takes to make this happen is just documentation, but there are a few key limitations in the hvPlot-first approach that I've listed as separate issues:

I imagine there are a few other rough edges, but if these can be addressed, I think we can have a much more compelling approach and starting point where we ruthlessly defend hvplot.holoviz.org as the starting point, never letting complex APIs or tricky concepts from other projects spill over into it, and ensuring that what it presents gives functionality without requiring wrestling with any of those APIs or concepts. We can then have a strict boundary between hvPlot and all other projects, happily sending people off from hvPlot into the other projects when truly necessary, but making it clear that it's a step up in complexity or commitment, and that staying with hvPlot alone may be all they ever need.

If we do go this route, in addition to the above functionality issues, I think we'll need:

Of course, there are downsides of this approach, at least for now:

How do people feel about this overall plan and vision?

julioasotodv commented 3 years ago

IMHO this has more to do with hvPlot visibility more than anything else.

I give lectures on data visualization with Python in a Msc, (mostly using the holoviz ecosystem). I start by showing how Bokeh works and how extensible/customizable it is.

Then I switch to HoloViews, and the API design is very polarizing: for some students thnking in kdims and vdims becomes second nature, whereas others struggle.

After that I present GeoViews and datashader and pretty much everyone likes their capabilities.

By the time I start talking about hvPlot, lots of students ask me: Why didn't we start with hvPlot in the first place? It is simple, and similar to what I'd say a lot of people expect for a chart: defining its elements based on columns/dimensions they have got in their dataset. It's straightforward and simple.

I'd say that hvPlot's largest drawback is the lack of popularity. Bokeh is popular because it has been around for quite some years, but most people don't know about hvPlot. And by the looks of it, I would say that most of the Pythonistas creating interactive charts would find in hvPlot 99% of what they need. Even in data visualization, a lot of users only want something that works (this is, as high level as possible). That is also why Plotly rolled out Plotly Express, and basically they encourage users to use it as the main API. In fact, Plotly Express is all over Plotly's docs. If it weren't for Plotly Express being announced everywhere, I'm sure lots of users would have given up on Plotly (not because it is hard, but because it is inconvenient if everything you want is a simple, fast interactive chart).

If more people knew hvPlot exists, the Holoviz ecosystem would probably end up having a larger userbase. If they want more fine-grained detail, they can allways fall back to hv / bokeh.

My two cents :)

WesleyTheGeolien commented 3 years ago

Having picked up the library recently I must say I have been confused into which library I have been using notably between what the difference between holoviews and hvplot was (I don't think it helped that I import holoviews as hv -> which is similar to hvplot) so yeah having a "correct" starting place is a great idea 👍

I also agree with the best practices I fancy(?) the API is different between different packages or maybe like you say the documentation uses the standard of year XXXX and I took that as the golden standard (I was thinking of geoviews.Dataset().to(geoviews.SOME_PLOT) whereas holoviews does holoviews.SOME_PLOT(holoviews.Dataset)

Not sure I can chime in on some of the more technical questions

jbednar commented 3 years ago

If more people knew hvPlot exists, the HoloViz ecosystem would probably end up having a larger userbase. If they want more fine-grained detail, they can always fall back to hv / bokeh.

I completely agree, and indeed that's one of the main reasons we created hvPlot, so that these tools would be accessible to a larger userbase. Here I'm basically saying that this plan has now been successful on a technical level, so let's do what it takes to make it successful on a community level, by inverting all of our messaging so that hvPlot is the first story, not an afterthought. Let's ignore history and tell the story the right way now!

SandervandenOord commented 3 years ago

My position is that visual analytics should be at the speed of thought. It should take little (mental) effort to create the plots I need. This is why I use hvplot.

It's very similar to pandas plotting (df.plot()), which is why it was relatively easy to switch to.

Only when you know a library well, you can be highly effective and fast in it. That's why I don't use bokeh, plotly and matplotlib. Takes much more time to know it well and I need too many lines of code to get things done, which is slowing analysis down.

The only serious interactive alternative is plotly express.

The .interactive() feature could be a killer feature as it would make it even easier to explore data quickly.

If you need help on hvplot, I'm more than willing to help.

jbednar commented 3 years ago

Thanks! Also see https://discourse.holoviz.org/t/using-the-new-interactive-with-pandas-not-xarray/1583/5, where some of these issues are discussed further.

mycarta commented 3 years ago

@jbednar : so, if one were to start from day zero again, would this be the right place (if you like video tutorials)? PyViz Unifying Python Tools for In Browser Data Visualization | SciPy 2018 (introducing hvplot at 12'33" and onwards)

MarcSkovMadsen commented 3 years ago

I also think hvplot would be the right entry point. Together with a modernized styling of the underlying bokeh engine.

With that you technically have a more powerful and extensible package than Plotly and plotly express. And a lot of Pandas users who would understand what you/ we are talking about. Then there is still the docs, examples, community and communication where Plotly is light years ahead.

One thing for me to understand. How would it be possible to not introduce Panel just a little bit if .interactive is a the center of hvPlot? You need the widgets from somewhere.

mycarta commented 3 years ago

recommending that users only dive into the individual tools when certain clearly state-able conditions are met

I think the overall plan sounds awesome. I also think listing those conditions somewhere in the documentation right now would already help new users.

MarcSkovMadsen commented 3 years ago

After having looked at the hvplot site I would think it needed an overhaul. The current site does not signal that hvPlot is at the center of HoloViz. The documentation is sparse and there is not even a search function.

jbednar commented 3 years ago

Right; before making it the center of the universe, I wanted to get some buy-in. The current site reflects the point in history at which hvPlot was introduced -- a bit cleaner design, a bit simpler story, but still quite sparse because most things were already documented on other sites, and hvPlot was an afterthought. To change that requires a lot of work, which we can do if there's agreement it's a good idea, which so far there seems to be.

@MarcSkovMadsen , for the widgets, I have a partial proposal at https://github.com/holoviz/panel/issues/1826 , where I propose doing something like interactive does to infer widgets from scalars and ranges. I don't think this approach would address the ambitious apps that you yourself are building, but that's not the point of this proposal; detailed control over widgets can still be passed off to Panel (particularly if pointing to specific sections of the Panel website, such as the widgets reference gallery). The goal would be for most users most of the time to not need to go figure out the name of such a widget or to look up its detailed options, but for it to be clear that if people need to do that they are welcome to do so.

@mycarta , sure, that's a reasonable starting point for plotting with hvPlot, in that the material there hasn't changed since then. It doesn't cover building apps using your hvPlot objects, which previously would have required callbacks and decorators, and now with .interactive only requires method calls. The hope is to be able to do viz and even apps "at the speed of thought", as @SandervandenOord says (as long as you think in terms of Pandas or Xarray API calls!).

mycarta commented 3 years ago

The hope is to be able to do viz and even apps "at the speed of thought", as @SandervandenOord says (as long as you think in terms of Pandas or Xarray API calls!).

That would be fantastic!!! For my part, I cannot foresee needing much more than that in my scientific computing and explorations, whether by day (job) or night (hobby).

SandervandenOord commented 3 years ago

Here's some of my thoughts still:

jbednar commented 3 years ago

HoloViews has such an interesting and original way of looking at data and visualizing data. I hope that doesn't get lost when putting more focus on hvplot.

Definitely. The original way that HoloViews looks at data is both its strength and its liability. It gets power from its deep model of how data works and what it means to plot it, but for users to exploit that power, they have to expand their own mental models to accommodate HoloViews. hvPlot takes the opposite approach: try to surface as much of the power available in HoloViews while staying roughly within the mental model people already have. At first it wasn't clear to me just how much could be made available in that way, but with .interactive it's clear to me that very, very much can be! The native HoloViews API will still be available for people like @SandervandenOord and the HoloViz developers who have developed that mental model, but in most cases people will no longer have to embrace the underlying HoloViews philosophy and model; they can benefit from it anyway!

Users have a need for quick plotting, but there's quite some good alternatives,

I've argued that of the available high-level APIs, it makes sense for users to focus on the Pandas .plot API because it's the only one that is available from a wide variety (at least 6!) of different libraries with different strengths. Thus given that no user can ever learn all APIs, and that most users will simply learn one and get back to what they were doing, I feel confident strongly recommending that that one API is .plot, regardless of whether you choose hvPlot.

What i wonder is that if I check this page, there are so relatively few downloads for both hvplot but also plotly express:

Download counts are very hard to reason about, but there are many reasons why low-level tools like Matplotlib and Bokeh will have higher download counts than high-level tools:

Personally, I strongly believe that nearly all users should start with a high-level tool and leave it behind only when they outgrow it or get frustrated with its limitations. So I think there should be vastly more people who use hvPlot than Bokeh or Matplotlib's native APIs, if things were able to be made rational at this point. But first-mover and history and network effects and fear of lockin instead lead users to start with low-level tools, making their lives much more difficult than they need to be. This proposal is basically proposing to (a) really polish and defend hvPlot to make it a suitable solution for the vast majority of users, then (b) try to point people to hvPlot directly from as many locations as possible so that they don't get stuck in complexity or learning curves that they don't actually have to navigate to solve their problems.

MarcSkovMadsen commented 3 years ago

What does the developers of holoviews think? Will they be just as happy to continue contributing to it? Or will they find it less attractive? and the development stop?

jbednar commented 3 years ago

hvPlot is only a thin layer on top of HoloViews. Nearly all innovation and bugfixing happens at the hv level, automatically fixing hvPlot or adding features to it. So there's no danger of hv atrophying. The big issue will just be that things will need documenting at the hvPlot level for people to realize they exist. So fixes that make things "just work" will have no extra cost; ones that require explaining will have to be evaluated for whether they are surfaced into the hvPlot universe. Either way, they'll exist first in HoloViews and only then, optionally, in hvPlot.