holoviz / hvplot

A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews
https://hvplot.holoviz.org
BSD 3-Clause "New" or "Revised" License
1.13k stars 109 forks source link

Toggle holoplot when using within Jupyter notebook #14

Closed drorata closed 6 years ago

drorata commented 6 years ago

This issue is a follow up on this SO answer as per @philippjfr request. Please keep in mind it's the first time I'm using holoplot (which seems to be awesome!)

I tried to use holoplot from within a Jupyter notebook. Once importing the package using

import holoplot.pandas

all .plot are going through holoplot. This is not always a convenient behavior. Sometimes, it is easy to think of a use case where some cells should be using holoplot while others will be rendered using the standard matplotlib. I hope this issue is good enough as a starting point for an interesting discussion.

philippjfr commented 6 years ago

@drorata Thanks for following up on this. I definitely think it is important to ensure that you can restore the original plot interface after you've installed the holoplot one. Some options we could consider:

  1. Add an unpatch function to complement the patch function.
  2. Make it possible to use the patch function as a context manager so you enable the interface for just one plot.
  3. Consider an approach like pdvega which patches pandas with a df.vgplot method, which means the original df.plot API is still available, e.g we'd simply add a df.hvplotmethod.

Personally I think we should definitely do 1., and I'm starting to think that until pandas allows swapping out the plot API, we should also do 3, or at least provide an easy way to enable it.

jbednar commented 6 years ago

All three of those sound useful to me, though I'd definitely bury option 2 deep in any documentation so that we don't confuse people. I'd vote for holoplot to make df.hvplot() always available after any patching is done (whether explicitly or via the pandas import), but for patch to take an argument not to patch over df.plot() if people will want to switch. That way people can always use .hvplot() without worrying about how that setting is configured.

jlstevens commented 6 years ago

As another option, how about a global function to toggle the approach used?

drorata commented 6 years ago

@jlstevens A global option can help to tackle the problem in the notebook's context, but it is not clean otherwise.

I think that overriding an existing (and in general stable and good) interface is a bad practice. At first, users would like to experiment with the new API and not replace the older and known one right at the beginning. I'm not sure whether it is the case with holoplot, but this approach is particularly bad when the override doesn't replace all functionalities available in the original.

Therefore, I think the best approach is (3) --- introducing pd.hvplot --- and disabling the auto-patching. I think the patching should happen at import only if the user explicitly asked for it.

jlstevens commented 6 years ago

If we decide to go for a different method name, I would recommend .holoplot as it is only two characters longer and makes the link to the library clear.

philippjfr commented 6 years ago

Therefore, I think the best approach is (3) --- introducing pd.hvplot

That makes the API a whole lot less convenient as it requires you to pass in the dataframe. You can already achieve the same thing by using holoplot.HoloPlot(df).plot(), or maybe you did mean df.hvplot?

and disabling the auto-patching.

Note that auto-patching is only enabled when you import holoplot.pandas, importing holoplot itself does not auto-patch, so I think the current approach is fine as long as it doesn't clobber df.plot by default.

If we decide to go for a different method name, I would recommend .holoplot as it is only two characters longer and makes the link to the library clear.

Personally I strongly prefer df.hvplot, I agreed on the renaming for the name of the library, but here I think even two characters makes a considerable difference. Also, since a user explicitly has to enable holoplot, I don't think anyone is likely to get confused about what the link is. But I'll let @jbednar chime in.

jlstevens commented 6 years ago

Personally I would rather find a way to avoid approach 3 which would make the naming issue moot.

philippjfr commented 6 years ago

I'm not sure what you mean by avoiding approach 3. Are you suggesting not patching the dataframe at all or are you suggesting clobbering df.plot is fine?

jlstevens commented 6 years ago

Maybe some way of wrapping the original plot method instead of clobbering it entirely so that it can still be used if requested.

drorata commented 6 years ago

@philippjfr Sorry. I meant df.hvplot.

How can I use holoplot on a data frame without import holoplot.pandas? The latter is very intrusive and should be used, IMO, only as a last resort or making it much clearer that the original API is overtaken.

drorata commented 6 years ago

@jlstevens isn't this df.hvplot doing exactly this?

philippjfr commented 6 years ago

Maybe some way of wrapping the original plot method instead of clobbering it entirely so that it can still be used if requested.

I agree that would be a good eventual goal, but I also think there needs to be some more discussion with the pandas folks surrounding the API for extending the plot interface. In future I can imagine a global toggle in pandas to set the plotting engine and a keyword to the plotting interface that let's you toggle the engine on a plot by plot basis. That's roughly what was discussed on the relevant pandas issues, but I don't want to pre-guess the API that will come out of that discussion so for now I think it's safer to go with 3., until that discussion comes to a conclusion.

How can I use holoplot on a data frame without import holoplot.pandas? The latter is very intrusive and should be used, IMO, only as a last resort or making it much clearer that the original API is overtaken.

Currently the only two ways of using it is by patching df.plot and by explicitly constructing a HoloPlot object, e.g.:

plot = holoplot.HoloPlot(df)
plot.line('x', 'y')

As soon as we come to a conclusion on this issue I'll release a new version of holoplot, which does not patch over df.plot.

jlstevens commented 6 years ago

Personally I strongly prefer df.hvplot, I agreed on the renaming for the name of the library, but here I think even two characters makes a considerable difference.

How about df.hlplot which is the same length which I greatly prefer? This is also the two letter alias I would recommend for holoplot in general (which I don't think is taken by any other project afaik). In other words, if you ever needed it, I would use import holoplot as hl.

Edit: Or hp if you prefer... but probably not as that would be double p unless it is just hplot.

drorata commented 6 years ago

If I can help in any way, please let me know!

jbednar commented 6 years ago

I think we should make patch() implement option 3, but I don't have any strong opinion between df.holoplot(), df.hvplot(), df.hlplot(), df.hpplot, or df.hplot(); they all seem reasonable to me. There's also df.hv(), which is even shorter and has a natural interpretation as "construct and return a HoloViews object from this dataframe" (which is what this method actually does). I'll leave this decision up to Philipp, as the HoloPlot author, unless he wants me to break a tie.

I don't think we should make a global variable to control this, unless that's something worked out with pandas and other library maintainers; better to wait for some consensus on that one, as it's tricky to have things depend on global state, particularly in the context of out-of-order execution in a notebook.

In general, I'm wary of offering too many options, because the whole point of HoloPlot is to be simple and convenient, so anything arcane and confusing that we have to document is a serious problem.

Overall, I think we should:

a. Continue to offer the convenient but heavy-handed import holoplot.pandas approach, which I think is a simple way for most new users to get started, while making it clear that it's completely optional and that no feature of HoloPlot depends on using that. b. Make sure that any time patch() is called, whether explicitly or via this import approach, the non-overwriting method name is also available, because that way there is a migration path into using this safer mechanism and a way to specify a call that will always work for any patched dataframe. c. Make a more convenient way to use HoloPlot that does b but does not do a. It would presumably require an extra line of code compared to import holoplot.pandas, but it should do the same thing (both run the extension and patch in the method).

Then we have to be very careful about how we do the documentation so that it's clear that these options are available, without confusing new users. That will be tricky!

drorata commented 6 years ago

I am missing something. When you guys write patch does it mean overriding the native plot of pandas? If the patch did that, then df.holoplot won't help anymore, because there's no way of using df.plot in its original functionality.

Shouldn't there be two parallel tracks:

  1. Patch and enable the extension df.holoplot to all data frames. This patching will allow using the holoplot engine but will keep the default one intact.
  2. Patch and override df.plot. This patching overrides the default behavior of pandas plotting. Probably, when using this patching, it doesn't hurt to have df.holoplot as well.
jbednar commented 6 years ago

Just to be very explicit, holoplot.patch() is a function we provide that does monkeypatching to install a method on objects in various libraries, including pandas dataframes. Currently, it does override df.plot(), but even if we used a different name like df.hvplot() it would still be doing monkeypatching, i.e. extending an existing object with a new or changed attribute.

So there are two different issues here, i.e. whether to patch or not (at all!), and if patching, whether to overwrite .plot() or use a different name or both.

Not patching at all is already supported; see https://pyviz.github.io/holoplot/user_guide/Introduction.html#HoloPlot-native-API

If patching, there's currently only one behavior for patching, which is to override df.plot(), but I'm proposing (as you are as well) that it only optionally override, by allowing a different name to be used to avoid overriding.

I guess even in the overriding case, we could still make the original method available as something like df.plot.orig(). That's ugly, but it would at least ensure we haven't lost any functionality...

philippjfr commented 6 years ago

Based on this discussion I've come to the conclusion that at least in the case of pandas and probably xarray too, we should not override the plot method by default and instead use df.hvplot or similar. If you do want to override df.plot I'll make that an option in holoplot.patch.

Out of all the options, namely: df.holoplot(), df.hvplot(), df.hlplot(), df.hpplot, or df.hplot() my strong preference is still for df.hvplot(). I think it's the obvious abbreviation for a plot interface based on HoloViews, and the only reason I see to avoid it is if you want to obscure the association with HoloViews, which seems strange for a plotting API that returns HoloViews objects.

After talking to several people about this, it's also the only suggestion that was repeated to me. Lastly, I don't think holoplot itself needs an abbreviation as it's namespace is tiny and you rarely need more than one or two things from it, e.g. patch and show. So unless someone has a good reason why hlplot is preferable over hvplot that's what I'll go with.

jbednar commented 6 years ago

Ok by me. You didn't list df.hv() in your list of options; I'm not strongly pushing for it, but should at least explicitly reject it.

I'd argue that for a package with no existing .plot() method, the same method name should be available even if you also use .plot() there, and all our examples should use the same method name (currently .plot(), but whatever you decide should be the result of import holoplot.pandas).

philippjfr commented 6 years ago

I can definitely see the appeal in df.hv() but I also have a vague feeling that it's a bit odd.

jbednar commented 6 years ago

Right; df.hv() is short while also semantically accurate, but it may just seem mysterious, and does not evoke df.plot().

jlstevens commented 6 years ago

I still vote for .holoplot and not hvplot which I dislike a lot. With tab completion in IDEs/notebook I don't think an extra 2 letters in an issue.

jbednar commented 6 years ago

I vote for.hvplot over .holoplot, but only by 25% or so. Most code in a notebook is read (or at least looked at) many more times than it's written, so any concerns about length should be about reading, not writing.

drorata commented 6 years ago

I think df.hvplot() is in some sense unfair choice. holoplot comes on top of holoveiew and there is no guarantee there won't be other attempts along this line. In this case, choosing df.hvplot is sort of a hijack.

Furthermore, I vote against df.hv --- the string plot should be in the name. Otherwise, it is likely to introduce confusion. Either df.hpplot or df.hplot would be my favorites.

@jbednar Thanks for the clarification with regards to patching.

jlstevens commented 6 years ago

Given that I'm the only one in favor of .holoplot, let me just say that .hplot would be my second choice. It is as short as possible while distinguishing itself from .plot.

jbednar commented 6 years ago

I have no objection to .hplot(), though it's more likely to be an existing command on some library than .hvplot() is.

philippjfr commented 6 years ago

I think df.hvplot() is in some sense unfair choice. holoplot comes on top of holoviews and there is no guarantee there won't be other attempts along this line. In this case, choosing df.hvplot is sort of a hijack.

We are the developers of HoloViews, so we can't hijack it from ourselves right? :-)

Given that I'm the only one in favor of .holoplot, let me just say that .hplot would be my second choice.

hplot makes me think of horizontal plot (once upon a time it was an alias for a row layout in bokeh). You keep saying you don't like hvplot but I still don't really know why, particularly when compared to alternatives like hlplot or hplot. I still strongly prefer it over all the alternatives presented, so at this point I'm going to assert my privilege as author of the library.

jlstevens commented 6 years ago

We are the developers of HoloViews, so we can't hijack it from ourselves right? :-)

Holoplot is a separate library. It uses holoviews, it is not holoviews.

drorata commented 6 years ago

We are the developers of HoloViews, so we can't hijack it from ourselves right? :-)

I didn't realize it. But still, maybe at some point, someone would like to implement a different approach using HoloViews. As @jlstevens mentioned --- holoplot and holoviews are two different animals happen to be nursed by the same amazing people.

philippjfr commented 6 years ago

Okay, after discussing this with the xarray maintainers there's two takeaways:

I have to say I kind of like the da.hv.plot() suggestion now. The other methods would simply live at the same level, so you could do da.hv.line(), da.hv.image(), etc. and da.hv.plot would simply figure out the appropriate kind (e.g. 'hist'/'line'/'image') for you.

jbednar commented 6 years ago

And then da.hv() could be an alias for da.hv.plot(); less recognizable but more accurate. :-) I have no particular preference between those options.

philippjfr commented 6 years ago

And then da.hv() could be an alias for da.hv.plot(); less recognizable but more accurate. :-)

That's right although the documentation would use da.hv.plot() to give users a familiar entrypoint.

philippjfr commented 6 years ago

I'd then also consider adding da.gv.plot which would use geoviews types automatically rather than having to supply a crs explicitly.

jlstevens commented 6 years ago

One last suggestion for you to consider before you decide (I agree something like da.holoplot.quadmesh is too long):

da.holo.plot()
da.holo.line()
da.holo.image()

I quite like this but now I've made the suggestion it is up to you to decide.

jbednar commented 6 years ago

I'd then also consider adding da.gv.plot which would use geoviews types automatically rather than having to supply a crs explicitly.

Makes sense, and then it would assume PlateCarree?

philippjfr commented 6 years ago

After a bit of thought I've concluded that da.hv.[plot_type] is too short a prefix which means it might clash with a column/variable name. I sort of like da.holo.[plot_type] but also think it looks a bit weird, so at this point I'm close to flipping a coin between da.hvplot() and da.holo.plot().

philippjfr commented 6 years ago

Okay final decision, I'm going with df.holoplot.