mwaskom / seaborn

Statistical data visualization in Python
https://seaborn.pydata.org
BSD 3-Clause "New" or "Revised" License
12.6k stars 1.93k forks source link

Uncertainties v.s weights & Averaging over columns / coordinates. #3766

Closed doronbehar closed 1 month ago

doronbehar commented 1 month ago

Hello. You have no idea how much I enjoy using your package. It fits exactly my usage, and I can't believe that only at this stage of my project I came to know it!

I'm working with a large xarray.Dataset with N>4 coordinates which I convert .to_dataframe() in order to plot them with seaborn.lineplot. It became confusing to me when I wanted seaborn to show my calculations' uncertainty. At first, I wasn't sure even how to save that uncertainty, until I realized that you don't call it "uncertainty", but rather the weights of the data variables for the estimation, and that they should simply be saved in a separate data variable.

If I need to perform estimation, it works pretty good I suppose. However, I found that terminology choice a bit peculiar, because weights are something only proportional to each other, whereas uncertainties also have a meaning when the data is not averaged. The below formulas are the formulas I'm familiar with regarding this. Note how $\mu = x0$ and $\sigma\mu = \sigma_0$ are obtained if the summation is over 1 element:

$$ \mu = \frac{\sum_i (x_i/\sigma_i^2)}{\sum_i \sigma_i^{-2}}$$

$$ \sigma_\mu = 1/\sqrt{\sum_i \sigma_i^{-2}} $$

I also noticed, that if I give seaborn.lineplot a dataset.to_dataframe() with only 1 coordinate, then the weights aren't taken into account at all. I understand that I can supply a custom function to the errorbar argument. But I think it would have been much more consistent if instead of the weights argument, an uncertainties argument would have been used, and the uncertainties would have been used as error bars even if no estimation is required (because there is a single y per x).

mwaskom commented 1 month ago

At first, I wasn't sure even how to save that uncertainty, until I realized that you don't call it "uncertainty", but rather the weights of the data variables for the estimation, and that they should simply be saved in a separate data variable.

Hi, I think you're thinking about this slightly wrong — the weights parameter exists so that you can compute weighted mean, not to provide a measure of uncertainty.

doronbehar commented 1 month ago

At first, I wasn't sure even how to save that uncertainty, until I realized that you don't call it "uncertainty", but rather the weights of the data variables for the estimation, and that they should simply be saved in a separate data variable.

Hi, I think you're thinking about this slightly wrong — the weights parameter exists so that you can compute weighted mean, not to provide a measure of uncertainty.

I understood that correctly in the first place, but the way I phrased the sentence indeed implied otherwise. What I meant to say was that the closest thing related to uncertainties in seaborn is the weights parameter.

I wonder what do you think about adding an uncertainties parameter that would act as I suggested? Do you think it'd be beneficial? (Please reopen :pray:)

mwaskom commented 1 month ago

Sorry, there's been plenty of discussion of related topics before. I'm not open to adding this.

doronbehar commented 1 month ago

Sorry, there's been plenty of discussion of related topics before. I'm not open to adding this.

Could you link me to those discussions? I want to know what were the arguments for / against were.. These search results don't show discussions about the simplest case of a seaborn.lineplot...