When doing a so.Hist(common_bins=False), if the bins for each group overlap, the width calculated for each mark is smaller that it should be.
Here's a minimal working example, where I have a dataset A, and its x-shifted version B = A + shift. In each row, I'm plotting a different shift, and when they start overlapping, the bar width is smaller than the bin width.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn.objects as so
def plot(ax, shift: float):
data = np.random.default_rng(0).normal(size=50)
df = pd.DataFrame({"A": data, "B": data + shift}).melt()
return (
so.Plot(df, x="value", color="variable")
.add(so.Bars(), so.Hist(common_bins=False))
.on(ax)
.plot()
)
shifts = [4.5, 4, 3.9, 3.8]
fig, axes = plt.subplots(len(shifts), sharex=True, gridspec_kw={"hspace": 0})
for ax, shift in zip(axes, shifts):
plot(ax, shift)
ax.set(ylabel=f"{shift = }")
If the bin edges are [0, 1, 2] and [0.5, 1.5, 2.5] for each group, it calculates the bin width from [0, 0.5, 1, 1.5, ...] and finds a width of 0.5 instead of a width of 1.
Maybe this is not a bug but something by design when there is overlap between marks?
In case it is a bug, I could contribute a fix, but would probably need some direction as to where to fix it.
When doing a
so.Hist(common_bins=False)
, if the bins for each group overlap, the width calculated for each mark is smaller that it should be.Here's a minimal working example, where I have a dataset
A
, and its x-shifted versionB = A + shift
. In each row, I'm plotting a different shift, and when they start overlapping, the bar width is smaller than the bin width.I could trace it to this width calculation: https://github.com/mwaskom/seaborn/blob/b4e5f8d261d6d5524a00b7dd35e00a40e4855872/seaborn/_core/plot.py#L1453 which ends up running the following line for all groups as one: https://github.com/mwaskom/seaborn/blob/b4e5f8d261d6d5524a00b7dd35e00a40e4855872/seaborn/_core/scales.py#L467
If the bin edges are
[0, 1, 2]
and[0.5, 1.5, 2.5]
for each group, it calculates the bin width from[0, 0.5, 1, 1.5, ...]
and finds a width of 0.5 instead of a width of 1.Maybe this is not a bug but something by design when there is overlap between marks?
In case it is a bug, I could contribute a fix, but would probably need some direction as to where to fix it.
Thanks!