Closed steipatr closed 1 year ago
Thanks for this effort! I really dig the bivariate histograms. The contour plots look great for float and int inputs, but for categorical and bool inputs I find them confusing.
My suggestion for the default would be scatter plot on the upper triangle (like now), bivariate histograms on the lower one (new), and kernel density estimates on the diagonal (like now).
But maybe we can just add three arguments to the function, like
def show_pairs_scatter(upper=“scatter”, lower=“bivariate”, diag=“kde”):
And then we also allow “contour”
for upper
and lower
and “cumulative”
for diag
.
Also really curious what @quaquel thinks!
Edit: Just saw you already added those arguments, great idea!
I agree that bivariate histograms are clearer across different variable types. Because of the binning, the resulting plots are offset a bit against the subspace boxes. If it becomes the default plot, that should probably be addressed.
I agree with the defaults suggested by @EwoutH and the offset suggested by @steipatr. Once those things are added, this is ready to be merged. Very good stuff!
I have updated the default behavior as suggested by @EwoutH, and fiddled a bit with padding to make the boxes look better on top of categorical variables. Please note that bivariate histograms are considered experimental according to Seaborn docs.
EDIT: To clarify, from my perspective, this branch is ready for merging.
Current look:
Looks very good. I'll check the code asap and merge if everything is fine now.
Thanks, @EwoutH for these suggestions. I agree with all of them and have added the easy ones already as commits. Only the enum/string thing still needs to be done.
I added the enum to the code. I guess this is ready to be merged, but @EwoutH if you have a chance to give it a last quick check, that would be great.
Personally I find ENUM a bit overkill for such a construct, but it can't hurt either.
Did some manual testing happen?
If nothing major has changed since my last review, it's good to go.
I agree on the enum comment. I decided to go down the enum route for parallelism with density plots. I would even argue that the density plots might also be overkill (4 options only). I have tested the code with the prim example. No further changes took place.
Thanks @steipatr
Thanks a lot Patrick!
Thank you both for your efforts! Especially @quaquel for finishing the last bits. I think it's a great enhancement. Now to generalize it to multiple subspaces...
These changes add some interesting plotting options to the pair plots for scenario discovery. New options include contour plots and bivariate histograms, as well as the option to leave the upper triangle empty to reduce visual clutter. There are no new required arguments, so existing code should still run, but will produce different figures. These changes are possible by moving the pair plot code from Seaborn's PairPlot to the more flexible PairGrid.
If accepted, this closes #98 .
Current EMA code.
Proposed new default behavior, with kernel density estimates on diagonal, contour plot and scatter plot. Less whitespace between plots and axes.
Some alternative options: cumulative density functions on diagonal, bivariate histogram in lower triangle, and empty upper triangle. Same whitespace between plots and axes as current implementation.