pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.26k stars 17.8k forks source link

ENH: plotting function for "2D histograms"/"dynamic spectra" (similar to heatmap) #33560

Open johan12345 opened 4 years ago

johan12345 commented 4 years ago

Is your feature request related to a problem?

I often need to plot a heatmap of a DataFrame which uses an IntervalIndex as its columns (and, usually, time as its index). Such a plot could also be called a "dynamic spectrum" or "2D histogram" and is used to quickly get an idea of how a spectrum develops over time.

This is slightly different from what is usually considered as a heatmap (see #19008 for an example) as the bins are not necessarily equidistant and there is not necessarily a separate label for each bin. The y axis (which is used for the IntervalIndex) could even have logarithmic scaling.

Describe the solution you'd like

This could use the same API df.plot(type='heatmap') as suggested in #19008 and switch between appropriate axis scaling/labeling modes depending on whether a CategoricalIndex, IntervalIndex or other types of indices are used.

Describe alternatives you've considered

My current implementation (see below) uses matplotlib's pcolormesh, but needs to do some fiddling with the bin edges to work correctly.

Matplotlib's hist2d does not work for this use case, because the data is already stored in histogrammed form - the histogram and its bins don't need to be calculated, just plotted.

Seaborn's heatmap function seems to be limited to plotting categorical data, so both IntervalIndex and DatetimeIndex are displayed as categorical data with one label per bin, equidistant spacing, and values on the y axis sorted from top to bottom instead of bottom to top:

Additional context

My current implementation looks similar to this:

binedges = np.append(df.columns.left, df.columns.right[-1])
X, Y = np.meshgrid(df.index, binedges)
pcm = ax.pcolormesh(X, Y, df.values.T)

# then add labels, colorbar etc.

This only works if the IntervalIndex has no gaps and is non-overlapping, which would have to be checked first.

Rik-de-Kort commented 4 years ago

I'm not sure this is a satisfactory solution, but I wanted to share it anyway since it does solve your problem in a neat way. However, it requires you to use a different library for visualization so I appreciate if that's not workable for you.
I'm using Altair but I imagine the ggplot package can do something similar. Here's a pic. Note that it can handle missing data, without distorting the axes!
canvas

And here is the code:

import numpy as np
import pandas as pd
import altair as alt

# Generate some data
df = pd.DataFrame(np.random.rand(10, 10), index=pd.date_range("2020-04-19", periods=10, freq="D"), columns=pd.interval_range(start=0, end=1, periods=10))
df = df.drop(index=df.index[4], columns=df.columns[6])

# Reshape frame for visualization.
# We cast intervals to simple tuples of their endpoints,
# "melt" the dataframe and unpack the tuples so we
# end up with a frame of the form [timestamp, value, left_endpoint, right_endpoint]
tidy = df
tidy.columns = [(interval.left, interval.right) for interval in tidy.columns]
tidy = tidy.reset_index() # Reset index necessary because pd.melt drops index, see #17440
tidy = tidy.melt(id_vars="index", var_name="energy") 
tidy[["energy0", "energy1"]] = pd.DataFrame(tidy.energy.tolist())

# Visualize.
alt.Chart(tidy).mark_rect().encode(x="monthdate(index)", y="energy0", y2="energy1", color="value")
DeeDiveT commented 4 years ago

@Rik-de-Kort This solution looks awesome! However, when I try to run your code, there is nothing shown on my screen. Do you know how to solve that? Thanks

Rik-de-Kort commented 4 years ago

@Rik-de-Kort This solution looks awesome! However, when I try to run your code, there is nothing shown on my screen. Do you know how to solve that? Thanks

Ah yes, add .serve() to the end of the chart, that will start a renderer. I didn't include it in case the user was in a notebook.