Improved pair plots for scenario discovery

quaquel / EMAworkbench

workbench for performing exploratory modeling and analysis

BSD 3-Clause "New" or "Revised" License

128 stars 90 forks source link

Improved pair plots for scenario discovery #288

Closed steipatr closed 1 year ago

steipatr commented 1 year ago

These changes add some interesting plotting options to the pair plots for scenario discovery. New options include contour plots and bivariate histograms, as well as the option to leave the upper triangle empty to reduce visual clutter. There are no new required arguments, so existing code should still run, but will produce different figures. These changes are possible by moving the pair plot code from Seaborn's PairPlot to the more flexible PairGrid.

If accepted, this closes #98 .

Current EMA code. download

Proposed new default behavior, with kernel density estimates on diagonal, contour plot and scatter plot. Less whitespace between plots and axes. download

Some alternative options: cumulative density functions on diagonal, bivariate histogram in lower triangle, and empty upper triangle. Same whitespace between plots and axes as current implementation. download

coveralls commented 1 year ago

coverage: 80.764% (-0.03%) from 80.798% when pulling 58b84c4bfb5325782b7def8935f72741a6e5da58 on steipatr:better-pairplots into 90908518a77c3c03d86e1d78c150f2e6b39eb427 on quaquel:master.

EwoutH commented 1 year ago

Thanks for this effort! I really dig the bivariate histograms. The contour plots look great for float and int inputs, but for categorical and bool inputs I find them confusing.

My suggestion for the default would be scatter plot on the upper triangle (like now), bivariate histograms on the lower one (new), and kernel density estimates on the diagonal (like now).

But maybe we can just add three arguments to the function, like

def show_pairs_scatter(upper=“scatter”, lower=“bivariate”, diag=“kde”):

And then we also allow “contour” for upper and lower and “cumulative” for diag.

Also really curious what @quaquel thinks!

Edit: Just saw you already added those arguments, great idea!

steipatr commented 1 year ago

I agree that bivariate histograms are clearer across different variable types. Because of the binning, the resulting plots are offset a bit against the subspace boxes. If it becomes the default plot, that should probably be addressed.

quaquel commented 1 year ago

I agree with the defaults suggested by @EwoutH and the offset suggested by @steipatr. Once those things are added, this is ready to be merged. Very good stuff!

steipatr commented 1 year ago

I have updated the default behavior as suggested by @EwoutH, and fiddled a bit with padding to make the boxes look better on top of categorical variables. Please note that bivariate histograms are considered experimental according to Seaborn docs.

EDIT: To clarify, from my perspective, this branch is ready for merging.

Current look:

quaquel commented 1 year ago

Looks very good. I'll check the code asap and merge if everything is fine now.

quaquel commented 1 year ago

Thanks, @EwoutH for these suggestions. I agree with all of them and have added the easy ones already as commits. Only the enum/string thing still needs to be done.

quaquel commented 1 year ago

I added the enum to the code. I guess this is ready to be merged, but @EwoutH if you have a chance to give it a last quick check, that would be great.

EwoutH commented 1 year ago

Personally I find ENUM a bit overkill for such a construct, but it can't hurt either.

Did some manual testing happen?

If nothing major has changed since my last review, it's good to go.

quaquel commented 1 year ago

I agree on the enum comment. I decided to go down the enum route for parallelism with density plots. I would even argue that the density plots might also be overkill (4 options only). I have tested the code with the prim example. No further changes took place.

quaquel commented 1 year ago

Thanks @steipatr

EwoutH commented 1 year ago

Thanks a lot Patrick!

steipatr commented 1 year ago

Thank you both for your efforts! Especially @quaquel for finishing the last bits. I think it's a great enhancement. Now to generalize it to multiple subspaces...