mwaskom / seaborn

Statistical data visualization in Python
https://seaborn.pydata.org
BSD 3-Clause "New" or "Revised" License
12.62k stars 1.94k forks source link

histogram with fixed binwidth - unexpected results for last column #3644

Closed KathSe1984 closed 9 months ago

KathSe1984 commented 9 months ago

When creating a simple histogram with binwidth of 1, I was surprised that the last two numbers were merged in a single column.

fig = sns.histplot([0,1,1,1,3,4,6,7], binwidth=1)

grafik

Similarly, the last bar in the other example is placed directly adjacent to the previous one although I would expect a gap here:

fig = sns.histplot([0,1,1,1,3,4,6], binwidth=1)

grafik

Is this a bug or is there an explanation for this?

mwaskom commented 9 months ago

What version of seaborn? https://github.com/mwaskom/seaborn/blob/master/.github/CONTRIBUTING.md#reporting-bugs

jhncls commented 9 months ago

@KathSe1984 Maybe you want to consider sns.histplot(..., discrete=True)? In general, histograms work with floats, and typically the last value goes into the last bin, even if theoretically it should fall just outside. For integer input, discrete=True makes sure each value has its own bin.

Here is how sns.histplot([0, 1, 1, 1, 3, 4, 6], discrete=True) looks like. image

@mwaskom The plot shown with binwidth=1 (without discrete=True) merges the two last bins, both with seaborn 0.13.2 as with the current dev version.

Personally, I don't know what the least-surprising implementation would be. Floats can be really pesky. And typical inputs can be integers, or floats rounded to a fixed precision.

As an example, an input of [0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1] with binwidth=.2. One could stretch the binwidth by an epsilon, but then the first bin would have 3 values. Or one could shorten the binwidth by an epsilon, and either create a weird extra bin, or make the last bin extra wide. Each approach looks strange in its own way.

The current dev version (as well as 0.13.2) for sns.histplot([0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1], binwidth=.2) look like image

mwaskom commented 9 months ago

Agreed that discrete=True is probably what you want here.

mwaskom commented 9 months ago

I'm going to close as there's no evidence that seaborn is doing something wrong here. Thanks!