plotly / plotly.py

The interactive graphing library for Python :sparkles: This project now includes Plotly Express!
https://plotly.com/python/
MIT License
15.62k stars 2.51k forks source link

Updating `xaxis_range` changes y-axis range for `px.scatter` when marginals are displayed. #4643

Open GabrielKP opened 1 week ago

GabrielKP commented 1 week ago

Problem

When using px.scatter and displaying both marginals, updating the xaxis_range will change the yaxis_range such that data will be cut off.

Changing the yaxis_range works as expected. The bug only occurs when using both marginals.

I am using the current latest version 5.22.0.

Code

import plotly.express as px

# figure normal
fig = px.scatter(
    x=[3, 37, 37, 69],
    y=[1.8, 3.9, 2.7, 4.8],
    marginal_x="histogram",
    marginal_y="histogram",
)

fig.show()

# x axis is modified correctly
# but y axis is modified too
fig.update_layout(xaxis_range=[0, 185])

fig.show()

The first fig.show() produces: working

The second fig.show() produces: bugged

empet commented 1 week ago

Your example is somewhat artificial, as there is no reason to increase the xaxis_range. from 69 to 185, given the data you provided. When you update x and y data in the scatter plot or want to change the original
xaxis_range, then you must also update the yaxis_range accordingly, because resetting xaxis_range does not restart the px steps to set all layout data for the three traces. Thus, yaxis2_range (which is the axis for the bins associated with the right side histogram) is automatically set to yaxis_range.

y=[1.8, 3.9, 2.7, 4.8]
fig = px.scatter(
    x=[3, 37, 37, 69],
    y=y,
    marginal_x="histogram",
    marginal_y="histogram",
)

fig.update_layout(xaxis_range=[0, 185], yaxis_range=[min(y)-0.5, max(y)+0.5])

or with normal distributed data:

import plotly.express as px
import numpy as np
np.random.seed(2024)
mean = [0, 0]
cov = [[1.2, 0.8], [0.8, 0.98]] 
vals = np.random.multivariate_normal(mean, cov, 250)
fig = px.scatter(
    x=vals[:,0],
    y=vals[:,1],
    marginal_x="histogram",
    marginal_y="histogram",
)
fig.update_layout(width=650, height=450)
fig.show()

Updating xaxis and yaxis range:

fig.update_layout(xaxis_range=[-3,3],  yaxis_range=[-2,2])
GabrielKP commented 1 week ago

Your example is somewhat artificial, as there is no reason to increase the xaxis_range.

I have to increase the range to display the plot that I need to display, why is that not a valid reason? I thought I'd make life easier by trimming it down to four data points, but you can use any value you want above 69, it will cause the same bug.

When you update x and y data in the scatter plot or want to change the original xaxis_range, then you must also update the yaxis_range accordingly,

That's exactly the point, I do not want to set yaxis_range manually only because I set the xaxis_range. Why should it be expected behavior that the yaxis_range changes if I adjust the xaxis_range? And indeed in all other occasions (with only one marginal, or no marginal plot) this is not the case. So at best, this is inconsistent behavior.

empet commented 1 week ago

The scatter plot here is a sample from a bivariate distribution. In theory when you add new points, (x,y), it is supposed that both xaxis_range, yaxis_range are changing.That's why you must update both of them.

GabrielKP commented 1 week ago

That is fair, but I am not adding new points.

I am changing the range on one axis, expecting that such change will not affect the other axis.

There is 3 ways in which this behavior is inconsistent with the rest of the functions behavior:

  1. If I update thexaxis_range such that some data falls outside of the range the yaxis_range updates without cutting off data. Whereas if I update xaxis_range such that all data falls within the range, the yaxis_range updates cutting data off.
  2. If I update the yaxis_range the xaxis_range is unchanged. If I update the xaxis_range the yaxis_range changes and cuts data off.
  3. To reiterate the point from above, in all cases without the marginal I do not need to supply both axes when I change one.

If you are really adamant that both axes need to specified when you specify one, but ONLY for yaxis_range and ONLY when both marginals are set and ONLY when data falls outside of the range, then maybe note that in the documentation?

empet commented 1 week ago

    Could you mention why you would like to get an odd  chart with one of the marginal histograms exhibiting a blank space to the end of your xaxis range? Like any other figure generated  as a plotly graph object or via plotly express, this one (px.scatter with marginal histograms) was  designed to ensure  its aesthetic appearance and the presentation according to the theoretical standard underlying that object. Just "because I want to set a larger range" is not an argument.

GabrielKP commented 1 week ago

I think you'd rather have to tell me why you are adamantly defending the blatant inconsistency of exactly this case compared to all others? Why does your argument apply to this case and not the others?

My use case is completely independent in light of these inconsistencies, which at least should be documented, if it is truly intended so.

(I wanted to make plots for multiple conditions which I want to be comparable on the x axis, so I've set the x value to a certain value)