This error occurs when using relplot then adding a refline. From investigation, it appears that replot is duplicating the data. In this example, when i concatentate my 3 dataframes, I did not use 'ignore_index'; therefore there are duplicate indicies in the input data.
The problem is solved when I use ignore_index, or feed the data in with as df.reset_index(), however, the error message was not useful in discovering this! After tracking down the relplot source code, it appears the problem is related to the grid_data merging at the end of the function. I was able to solve this by skipping the "merge" if all of the columns are already present. I have submitted a pull request #3692 .
the error was: ValueError: operands could not be broadcast together with shapes (45000,) (15000,)
(the input data was 15000 rows long with 3 different "hue" variables)
Reproducible example:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
n_items = 5000
n_floats = 5
n_categorical = 3
df1 = pd.DataFrame(
np.random.random((n_items, n_floats)),
columns=[f'float{i}' for i in range(n_floats)]
)
df1 = df1.assign( **{f'categorical{i}': np.random.randint(0, 15, n_items) for i in range(n_categorical)})
df2 = pd.DataFrame(
np.random.random((n_items, n_floats)),
columns=[f'float{i}' for i in range(n_floats)]
)
df2 = df2.assign( **{f'categorical{i}': np.random.randint(0, 15, n_items) for i in range(n_categorical)})
df3 = pd.DataFrame(
np.random.random((n_items, n_floats)),
columns=[f'float{i}' for i in range(n_floats)]
)
df3 = df3.assign( **{f'categorical{i}': np.random.randint(0, 15, n_items) for i in range(n_categorical)})
df = pd.concat([df1.assign(origin=1), df2.assign(origin=2), df3.assign(origin=3)])
print(df)
fg=sns.relplot(data=df, x='float1', y='float2', hue='origin', row='categorical1')
print('main', fg.data.shape)
fg.refline(y=0.5)
plt.show()
This error occurs when using relplot then adding a refline. From investigation, it appears that replot is duplicating the data. In this example, when i concatentate my 3 dataframes, I did not use 'ignore_index'; therefore there are duplicate indicies in the input data.
The problem is solved when I use ignore_index, or feed the data in with as
df.reset_index()
, however, the error message was not useful in discovering this! After tracking down the relplot source code, it appears the problem is related to thegrid_data
merging at the end of the function. I was able to solve this by skipping the "merge" if all of the columns are already present. I have submitted a pull request #3692 .the error was:
ValueError: operands could not be broadcast together with shapes (45000,) (15000,)
(the input data was 15000 rows long with 3 different "hue" variables)Reproducible example: