Open jmoraispk opened 3 years ago
Hi @jmoraispk,
What do you mean by plot the selected variables? Can you upload a mockup to understand better your idea?
@spyder-ide/core-developers what do you think?
Hey @jmoraispk , thanks for your feature request. Actually, one-click plots are already available in Spyder's Variable Explorer for at least the past several years. You can already plot 1D Numpy arrays, Python lists and tuples (including nested and non-numeric ones) and more as histograms or line plots, 2D numpy arrays as line/scatterplots or images, and PIL images as images (not sure about plots), by simply right-clicking on the variable you want to plot and selecting the appropriate option. Currently, you can plot Series and Dataframes simply by converting them to an array with e.g. series_arr = series_na,values
, or df = df["column_name"].values
.
Spyder currently lacks support for 1-click plots for pandas Series
and DataFrames
, but it should be pretty simple to add those. Support for line plots and histograms of the former would be particularly trivial, since %varexp --plot series_name
and %varexp --hist series_name
already appear to work fine, so we just need to enable these options for Series in the right-click menu. The latter should be fairly straightforward, since we can just call df.plot(), and while we're at it, we could even add a bunch of plot types (line, bar, histogram, boxplot, stacked line, etc) with relatively little work by just varying the kind
parameter to df.plot()
, perhaps in a dropdown menu.
Selecting specific variables from a dataframe to plot against one another would be a tad more complex, since it will require some kind of UI to display and select the variables, However, as a start, we could offer a scatterplot option on dataframes with two columns to plot them against one another, so you could just subset your dataframe to the two variables you wanted to plot and plot that, and a Scatter Matrix option to >2 column dfs to show scatterplots + histograms of all the columns at once. This wouldn't be really much more work than adding plotting support for DataFrames in the first place, and would be a major step forward for Spyder's ploting support. This could even be added to 2D arrays as well, and there are a number of other plots that would be easy to add in the same fashion.
Finally, there isn't currently support for selecting multiple separate variables (vs. those in one DataFrame, array, etc) and plotting them against each other. Its not impossible to do, but it would take significantly more work than these other improvements, and there are a lot of issues this would run into that aren't found when using an appropriate data structure like an array or DataFrame—what do do about variables with different lengths, dimensions, data types, etc? What about more than two variables? At least for now, you could do so just by casting separate variables with the same length (otherwise, plotting isn't very meaningful anyway) into one dataframe, then plotting that with the above options—a good idea anyway.
If you'd like to help implement these improvements, we'd be happy to guide you on how to do so. If not, this is something I could knock out myself pretty easily, since its an area I'm familiar with and would offer major functionality gains for relatively little time spent to implement it. @ccordoba12 , would you suggest the enhancements discussed here go in Spyder 4 or in Spyder 5? They will require Spyder-Kernels changes, albeit ones that should be able to be made backwards-compatible.
This looks like it has all the functionality we need for pandas. Could it be incorporated?
It looks pretty slick, thanks @bcolsen! We could include it for Spyder 5, but:
That would be a really nice integration, and it doesn't look like it would take too much work to integrate depending on how we do it (a context menu item that just calls pandasgui on the dataframe, for instance as more of integration would be very simple, whereas a full-on built-in replacement for Spyder's own dataframe editor that's integrated with the Variable Explorer, plots pane, and console would be much more complex).
One complicating factor though—do we run it in Spyder's environment, or in the kernel env? The former would avoid adding another dep to Spyder kernels and avoid it breaking in case of a dep conflict in the environment, but if we wanted any of the interactive/edit/reshape, etc. features to work, we would either need to run it kernel-side or add the necessary plumbing to pass the dataframe back and forth to the kernel, which is not trivial but would be required if we wanted it to be a true replacement for our DataFrameEditor.
However, that's getting kind of out of scope of what's being requested on this issue (a few additional one-click plotting features), and is essentially orthogonal in purpose and implementation to what I propose above, which would be a simple extension of Spyder's existing plotting features and would work with our existing Variable Explorer, plots pane and console setup with no new dependencies or major changes.
If we added this as a new feature, we'd need to be committed to maintain it from Spyder 5 onwards. What I mean is: if the maintainers of that project decided to drop support for it, we'd need to do it for them.
Good point. To note, though, if worst comes to worst we wouldn't have to maintain it indefinitely especially if we package it as more an integration than a built-in part of Spyder...there's nothing obligating us to maintaining every Spyder feature we ever add forever if it no longer is possible or makes sense to do so, and if we implement it as a replacement for our DataFrame viewer/editor then we don't have to maintain that ourselves anymore.
We'd need to check if they support multi-indexes or other features that our current viewer has. If those are missing, we'd need to add them to that project.
Yeah, but like the above, this would depend on how we implemented it—as an additional integration/context menu option in the Variable Explorer, (Open in Pandasgui
), or as a full on replacement for the DataFrame viewer.
Yeah, but like the above, this would depend on how we implemented it—as an additional integration/context menu option in the Variable Explorer, (Open in Pandasgui), or as a full on replacement for the DataFrame viewer.
I was thinking right click to open in pandasgui. It could just be an optional dependency like pandas.
Ok, that's a good suggestion.
Dear Spyder Team,
Would it be possible to have one-click plots? An option in the variable explorer pane that plots the selected variables. Perhaps even something that supports adding all the selected variables to the same plot! One a tad more complicated perhaps would be plotting variables against each other, putting on on the x-axis and others on the y-axis.
Well, the idea is out there and I believe it would be innovative and useful!