Closed drorata closed 6 years ago
@drorata Thanks for following up on this. I definitely think it is important to ensure that you can restore the original plot interface after you've installed the holoplot one. Some options we could consider:
unpatch
function to complement the patch
function.pdvega
which patches pandas with a df.vgplot
method, which means the original df.plot
API is still available, e.g we'd simply add a df.hvplot
method.Personally I think we should definitely do 1., and I'm starting to think that until pandas allows swapping out the plot API, we should also do 3, or at least provide an easy way to enable it.
All three of those sound useful to me, though I'd definitely bury option 2 deep in any documentation so that we don't confuse people. I'd vote for holoplot to make df.hvplot()
always available after any patching is done (whether explicitly or via the pandas import), but for patch to take an argument not to patch over df.plot()
if people will want to switch. That way people can always use .hvplot()
without worrying about how that setting is configured.
As another option, how about a global function to toggle the approach used?
@jlstevens A global option can help to tackle the problem in the notebook's context, but it is not clean otherwise.
I think that overriding an existing (and in general stable and good) interface is a bad practice. At first, users would like to experiment with the new API and not replace the older and known one right at the beginning. I'm not sure whether it is the case with holoplot, but this approach is particularly bad when the override doesn't replace all functionalities available in the original.
Therefore, I think the best approach is (3) --- introducing pd.hvplot
--- and disabling the auto-patching. I think the patching should happen at import only if the user explicitly asked for it.
If we decide to go for a different method name, I would recommend .holoplot
as it is only two characters longer and makes the link to the library clear.
Therefore, I think the best approach is (3) --- introducing pd.hvplot
That makes the API a whole lot less convenient as it requires you to pass in the dataframe. You can already achieve the same thing by using holoplot.HoloPlot(df).plot()
, or maybe you did mean df.hvplot
?
and disabling the auto-patching.
Note that auto-patching is only enabled when you import holoplot.pandas
, importing holoplot itself does not auto-patch, so I think the current approach is fine as long as it doesn't clobber df.plot
by default.
If we decide to go for a different method name, I would recommend .holoplot as it is only two characters longer and makes the link to the library clear.
Personally I strongly prefer df.hvplot
, I agreed on the renaming for the name of the library, but here I think even two characters makes a considerable difference. Also, since a user explicitly has to enable holoplot, I don't think anyone is likely to get confused about what the link is. But I'll let @jbednar chime in.
Personally I would rather find a way to avoid approach 3 which would make the naming issue moot.
I'm not sure what you mean by avoiding approach 3. Are you suggesting not patching the dataframe at all or are you suggesting clobbering df.plot is fine?
Maybe some way of wrapping the original plot
method instead of clobbering it entirely so that it can still be used if requested.
@philippjfr Sorry. I meant df.hvplot
.
How can I use holoplot on a data frame without import holoplot.pandas
? The latter is very intrusive and should be used, IMO, only as a last resort or making it much clearer that the original API is overtaken.
@jlstevens isn't this df.hvplot
doing exactly this?
Maybe some way of wrapping the original plot method instead of clobbering it entirely so that it can still be used if requested.
I agree that would be a good eventual goal, but I also think there needs to be some more discussion with the pandas folks surrounding the API for extending the plot interface. In future I can imagine a global toggle in pandas to set the plotting engine
and a keyword to the plotting interface that let's you toggle the engine on a plot by plot basis. That's roughly what was discussed on the relevant pandas issues, but I don't want to pre-guess the API that will come out of that discussion so for now I think it's safer to go with 3., until that discussion comes to a conclusion.
How can I use holoplot on a data frame without import holoplot.pandas? The latter is very intrusive and should be used, IMO, only as a last resort or making it much clearer that the original API is overtaken.
Currently the only two ways of using it is by patching df.plot
and by explicitly constructing a HoloPlot object, e.g.:
plot = holoplot.HoloPlot(df)
plot.line('x', 'y')
As soon as we come to a conclusion on this issue I'll release a new version of holoplot, which does not patch over df.plot
.
Personally I strongly prefer df.hvplot, I agreed on the renaming for the name of the library, but here I think even two characters makes a considerable difference.
How about df.hlplot
which is the same length which I greatly prefer? This is also the two letter alias I would recommend for holoplot in general (which I don't think is taken by any other project afaik). In other words, if you ever needed it, I would use import holoplot as hl
.
Edit: Or hp
if you prefer... but probably not as that would be double p
unless it is just hplot
.
If I can help in any way, please let me know!
I think we should make patch()
implement option 3, but I don't have any strong opinion between df.holoplot()
, df.hvplot()
, df.hlplot()
, df.hpplot
, or df.hplot()
; they all seem reasonable to me. There's also df.hv()
, which is even shorter and has a natural interpretation as "construct and return a HoloViews object from this dataframe" (which is what this method actually does). I'll leave this decision up to Philipp, as the HoloPlot author, unless he wants me to break a tie.
I don't think we should make a global variable to control this, unless that's something worked out with pandas and other library maintainers; better to wait for some consensus on that one, as it's tricky to have things depend on global state, particularly in the context of out-of-order execution in a notebook.
In general, I'm wary of offering too many options, because the whole point of HoloPlot is to be simple and convenient, so anything arcane and confusing that we have to document is a serious problem.
Overall, I think we should:
a. Continue to offer the convenient but heavy-handed import holoplot.pandas
approach, which I think is a simple way for most new users to get started, while making it clear that it's completely optional and that no feature of HoloPlot depends on using that.
b. Make sure that any time patch()
is called, whether explicitly or via this import approach, the non-overwriting method name is also available, because that way there is a migration path into using this safer mechanism and a way to specify a call that will always work for any patched dataframe.
c. Make a more convenient way to use HoloPlot that does b but does not do a. It would presumably require an extra line of code compared to import holoplot.pandas
, but it should do the same thing (both run the extension and patch in the method).
Then we have to be very careful about how we do the documentation so that it's clear that these options are available, without confusing new users. That will be tricky!
I am missing something. When you guys write patch
does it mean overriding the native plot
of pandas
? If the patch did that, then df.holoplot
won't help anymore, because there's no way of using df.plot
in its original functionality.
Shouldn't there be two parallel tracks:
df.holoplot
to all data frames. This patching will allow using the holoplot
engine but will keep the default one intact.df.plot
. This patching overrides the default behavior of pandas
plotting. Probably, when using this patching, it doesn't hurt to have df.holoplot
as well.Just to be very explicit, holoplot.patch()
is a function we provide that does monkeypatching to install a method on objects in various libraries, including pandas dataframes. Currently, it does override df.plot()
, but even if we used a different name like df.hvplot()
it would still be doing monkeypatching, i.e. extending an existing object with a new or changed attribute.
So there are two different issues here, i.e. whether to patch or not (at all!), and if patching, whether to overwrite .plot()
or use a different name or both.
Not patching at all is already supported; see https://pyviz.github.io/holoplot/user_guide/Introduction.html#HoloPlot-native-API
If patching, there's currently only one behavior for patching, which is to override df.plot()
, but I'm proposing (as you are as well) that it only optionally override, by allowing a different name to be used to avoid overriding.
I guess even in the overriding case, we could still make the original method available as something like df.plot.orig()
. That's ugly, but it would at least ensure we haven't lost any functionality...
Based on this discussion I've come to the conclusion that at least in the case of pandas and probably xarray too, we should not override the plot method by default and instead use df.hvplot
or similar. If you do want to override df.plot
I'll make that an option in holoplot.patch
.
Out of all the options, namely: df.holoplot()
, df.hvplot()
, df.hlplot()
, df.hpplot
, or df.hplot()
my strong preference is still for df.hvplot()
. I think it's the obvious abbreviation for a plot interface based on HoloViews, and the only reason I see to avoid it is if you want to obscure the association with HoloViews, which seems strange for a plotting API that returns HoloViews objects.
After talking to several people about this, it's also the only suggestion that was repeated to me. Lastly, I don't think holoplot itself needs an abbreviation as it's namespace is tiny and you rarely need more than one or two things from it, e.g. patch
and show
. So unless someone has a good reason why hlplot
is preferable over hvplot
that's what I'll go with.
Ok by me. You didn't list df.hv()
in your list of options; I'm not strongly pushing for it, but should at least explicitly reject it.
I'd argue that for a package with no existing .plot()
method, the same method name should be available even if you also use .plot()
there, and all our examples should use the same method name (currently .plot(), but whatever you decide should be the result of import holoplot.pandas
).
I can definitely see the appeal in df.hv()
but I also have a vague feeling that it's a bit odd.
Right; df.hv()
is short while also semantically accurate, but it may just seem mysterious, and does not evoke df.plot()
.
I still vote for .holoplot
and not hvplot
which I dislike a lot. With tab completion in IDEs/notebook I don't think an extra 2 letters in an issue.
I vote for.hvplot
over .holoplot
, but only by 25% or so. Most code in a notebook is read (or at least looked at) many more times than it's written, so any concerns about length should be about reading, not writing.
I think df.hvplot()
is in some sense unfair choice. holoplot
comes on top of holoveiew
and there is no guarantee there won't be other attempts along this line. In this case, choosing df.hvplot
is sort of a hijack.
Furthermore, I vote against df.hv
--- the string plot
should be in the name. Otherwise, it is likely to introduce confusion. Either df.hpplot
or df.hplot
would be my favorites.
@jbednar Thanks for the clarification with regards to patching.
Given that I'm the only one in favor of .holoplot
, let me just say that .hplot
would be my second choice. It is as short as possible while distinguishing itself from .plot
.
I have no objection to .hplot()
, though it's more likely to be an existing command on some library than .hvplot()
is.
I think df.hvplot() is in some sense unfair choice. holoplot comes on top of holoviews and there is no guarantee there won't be other attempts along this line. In this case, choosing df.hvplot is sort of a hijack.
We are the developers of HoloViews, so we can't hijack it from ourselves right? :-)
Given that I'm the only one in favor of .holoplot, let me just say that .hplot would be my second choice.
hplot
makes me think of horizontal plot (once upon a time it was an alias for a row
layout in bokeh). You keep saying you don't like hvplot
but I still don't really know why, particularly when compared to alternatives like hlplot
or hplot
. I still strongly prefer it over all the alternatives presented, so at this point I'm going to assert my privilege as author of the library.
We are the developers of HoloViews, so we can't hijack it from ourselves right? :-)
Holoplot is a separate library. It uses holoviews, it is not holoviews.
We are the developers of HoloViews, so we can't hijack it from ourselves right? :-)
I didn't realize it. But still, maybe at some point, someone would like to implement a different approach using HoloViews. As @jlstevens mentioned --- holoplot
and holoviews
are two different animals happen to be nursed by the same amazing people.
Okay, after discussing this with the xarray maintainers there's two takeaways:
They unanimously preferred the custom accessor approach at least until pandas decides on some official approach for extending the plot API. The good news here that they provide xr.register_dataarray_accessor
and xr.register_dataset_accessor
decorators, which provide an official way to extend the API.
None of them expressed a strong preference for any name. The main suggestions were da.hv.plot()
or da.holoplot()
to match the name of the library.
I have to say I kind of like the da.hv.plot()
suggestion now. The other methods would simply live at the same level, so you could do da.hv.line()
, da.hv.image()
, etc. and da.hv.plot
would simply figure out the appropriate kind
(e.g. 'hist'
/'line'
/'image'
) for you.
And then da.hv()
could be an alias for da.hv.plot()
; less recognizable but more accurate. :-) I have no particular preference between those options.
And then da.hv() could be an alias for da.hv.plot(); less recognizable but more accurate. :-)
That's right although the documentation would use da.hv.plot()
to give users a familiar entrypoint.
I'd then also consider adding da.gv.plot
which would use geoviews types automatically rather than having to supply a crs
explicitly.
One last suggestion for you to consider before you decide (I agree something like da.holoplot.quadmesh
is too long):
da.holo.plot()
da.holo.line()
da.holo.image()
I quite like this but now I've made the suggestion it is up to you to decide.
I'd then also consider adding da.gv.plot which would use geoviews types automatically rather than having to supply a crs explicitly.
Makes sense, and then it would assume PlateCarree?
After a bit of thought I've concluded that da.hv.[plot_type]
is too short a prefix which means it might clash with a column/variable name. I sort of like da.holo.[plot_type]
but also think it looks a bit weird, so at this point I'm close to flipping a coin between da.hvplot()
and da.holo.plot()
.
Okay final decision, I'm going with df.holoplot
.
This issue is a follow up on this SO answer as per @philippjfr request. Please keep in mind it's the first time I'm using
holoplot
(which seems to be awesome!)I tried to use
holoplot
from within a Jupyter notebook. Once importing the package usingall
.plot
are going throughholoplot
. This is not always a convenient behavior. Sometimes, it is easy to think of a use case where some cells should be usingholoplot
while others will be rendered using the standardmatplotlib
. I hope this issue is good enough as a starting point for an interesting discussion.