sefffal / PairPlots.jl

Beautiful and flexible vizualizations of high dimensional data
https://sefffal.github.io/PairPlots.jl/dev
MIT License
130 stars 7 forks source link

Top left plot: Do something to prevent confusion about axis label #47

Open abhro opened 4 months ago

abhro commented 4 months ago

In the grid, the top-left corner cell is missing its label. Minimal working example attached (PDF from Pluto.jl notebook).

using CairoMakie
using PairPlots
pairplot(rand(100, 3))

pairplots-bug-report-mwe.pdf

sefffal commented 4 months ago

Hello @abhro , thank you for the report. You are actually the second person to raise this as an issue.

Could you tell me, what would you expect to see here? Maybe there is some misunderstanding about the margin (1D histogram) plots that we should clarify in the documentation.

abhro commented 4 months ago

Hi! Thank you for responding so quickly. I was expecting the labels to the left of the grid to be the same as the ones at the bottom, i.e. the "feature" names. For the MWE I submitted, it would be the number "1". In this example involving DataFrames, the label/title "age" is missing from the left of the top cell.

MWE with DataFrames

using CairoMakie, PairPlots
using DataFrames

df = DataFrame(:age => rand(100), :weight => rand(100), :height => rand(100))

pairplots(df)

PDF from Pluto.jl notebook: pairplots-bug-report-dataframes-mwe.pdf

sefffal commented 4 months ago

Hi @abhro thanks very much for clarifying. My concern is that the units on the vertical axis are rightly “counts” or “probability”. It is the horizontal axis that specifies the value of that feature.

To my knowledge it isn’t typical to label the vertical axis of a histogram in this way.

Perhaps we can think of another way to communicate this? One option would be to label the top left plots’s vertical axis “density” or similar, and to provide an option to include the feature names in that titles of each plot along the diagonal.

abhro commented 4 months ago

Hmm. I see your point about the tick marks. Maybe just the data label will help? The other part is about the full grid, where the tick marks/labels do become relevant. Not sure how to solve that either lol 😅

I'm also using the seaborn page as a guide https://seaborn.pydata.org/generated/seaborn.pairplot.html

sefffal commented 2 weeks ago

One potential compromise would be to add the text "counts of" to the label in the top left position, or something similar.