Open mofojed opened 1 year ago
To clarify, the important distinction is not between 32/64 bit precision, but between float/Float, where the latter is the pandas-specific dtype rather than a dtype that also exists in numpy.
Some additional detail that we just found... This script does produce the above error:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df_sb_multi = pd.DataFrame([
{"X": 0, "Y": 0.0, "Z": 1.0, "R": 0.498653, "S": 2.582756 },
{"X": 1, "Y": 0.841471, "Z": 0.540302, "R": 0.663367, "S": 3.193578 },
{"X": 2, "Y": 0.909297, "Z": -0.416147, "R": 0.326006, "S": 0.241508 },
{"X": 3, "Y": 0.14112, "Z": -0.989992, "R": 0.298382, "S": 40.054015 },
{"X": 4, "Y": -0.756802, "Z": -0.653644, "R": 0.410429, "S": 33.189659 },
{"X": 5, "Y": -0.958924, "Z": 0.283662, "R": 0.756501, "S": 41.980234 },
{"X": 6, "Y": -0.279415, "Z": 0.96017, "R": 0.412779, "S": 0.837251 }
])
df_sb_multi = df_sb_multi.convert_dtypes()
fig_sb_multi, sb_multi_ax = plt.subplots()
sb_multi_ax.clear()
sns.scatterplot(df_sb_multi, x="X", y="R", size="S", ax=sb_multi_ax)
but this one does not:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df_sb_multi = pd.DataFrame([
{"X": 0, "Y": 0.0, "Z": 1.0, "R": 0.498653, "S": 2.582756 },
{"X": 1, "Y": 0.841471, "Z": 0.540302, "R": 0.663367, "S": 3.193578 },
{"X": 2, "Y": 0.909297, "Z": -0.416147, "R": 0.326006, "S": 0.241508 },
{"X": 3, "Y": 0.14112, "Z": -0.989992, "R": 0.298382, "S": 40.054015 },
{"X": 4, "Y": -0.756802, "Z": -0.653644, "R": 0.410429, "S": 33.189659 },
{"X": 5, "Y": -0.958924, "Z": 0.283662, "R": 0.756501, "S": 41.980234 }
])
df_sb_multi = df_sb_multi.convert_dtypes()
fig_sb_multi, sb_multi_ax = plt.subplots()
sb_multi_ax.clear()
sns.scatterplot(df_sb_multi, x="X", y="R", size="S", ax=sb_multi_ax)
There is a hard threshold between plotting 6 and 7 points where this error starts.
That makes sense, the error is being raised from matplotlib ticker code that is producing the “brief” legend values.
Ultimately the pandas dtypes are an ongoing annoyance for seaborn. They’re still “experimental” in pandas and often cause issues in code written expecting numpy dtypes, which is most of matplotlib. Seaborn can try to cast data types and handle specific cases where they arise but there’s not a great general solution.
@mwaskom would the recommendation be to use numpy data types in the meantime? Are you aware of any notes in matplotlib with a similar recommendation?
Yes, your issue is with the Float64
types produced by convert_dtypes
. Using "regular" float64
should work fine.
Some additional detail that we just found... This script does produce the above error:
import matplotlib.pyplot as plt import seaborn as sns import pandas as pd df_sb_multi = pd.DataFrame([ {"X": 0, "Y": 0.0, "Z": 1.0, "R": 0.498653, "S": 2.582756 }, {"X": 1, "Y": 0.841471, "Z": 0.540302, "R": 0.663367, "S": 3.193578 }, {"X": 2, "Y": 0.909297, "Z": -0.416147, "R": 0.326006, "S": 0.241508 }, {"X": 3, "Y": 0.14112, "Z": -0.989992, "R": 0.298382, "S": 40.054015 }, {"X": 4, "Y": -0.756802, "Z": -0.653644, "R": 0.410429, "S": 33.189659 }, {"X": 5, "Y": -0.958924, "Z": 0.283662, "R": 0.756501, "S": 41.980234 }, {"X": 6, "Y": -0.279415, "Z": 0.96017, "R": 0.412779, "S": 0.837251 } ]) df_sb_multi = df_sb_multi.convert_dtypes() fig_sb_multi, sb_multi_ax = plt.subplots() sb_multi_ax.clear() sns.scatterplot(df_sb_multi, x="X", y="R", size="S", ax=sb_multi_ax)
but this one does not:
import matplotlib.pyplot as plt import seaborn as sns import pandas as pd df_sb_multi = pd.DataFrame([ {"X": 0, "Y": 0.0, "Z": 1.0, "R": 0.498653, "S": 2.582756 }, {"X": 1, "Y": 0.841471, "Z": 0.540302, "R": 0.663367, "S": 3.193578 }, {"X": 2, "Y": 0.909297, "Z": -0.416147, "R": 0.326006, "S": 0.241508 }, {"X": 3, "Y": 0.14112, "Z": -0.989992, "R": 0.298382, "S": 40.054015 }, {"X": 4, "Y": -0.756802, "Z": -0.653644, "R": 0.410429, "S": 33.189659 }, {"X": 5, "Y": -0.958924, "Z": 0.283662, "R": 0.756501, "S": 41.980234 } ]) df_sb_multi = df_sb_multi.convert_dtypes() fig_sb_multi, sb_multi_ax = plt.subplots() sb_multi_ax.clear() sns.scatterplot(df_sb_multi, x="X", y="R", size="S", ax=sb_multi_ax)
There is a hard threshold between plotting 6 and 7 points where this error starts.
I tried using astype function to convert the dataframe dtypes to float64 and it seems to work for more than the threshold of 7 rows. Version info: pandas 2.1.2, seaborn 0.13.0, matplotlib 3.7.1
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df_sb_multi = pd.DataFrame([
{"X": 0, "Y": 0.0, "Z": 1.0, "R": 0.498653, "S": 2.582756 },
{"X": 1, "Y": 0.841471, "Z": 0.540302, "R": 0.663367, "S": 3.193578 },
{"X": 2, "Y": 0.909297, "Z": -0.416147, "R": 0.326006, "S": 0.241508 },
{"X": 3, "Y": 0.14112, "Z": -0.989992, "R": 0.298382, "S": 40.054015 },
{"X": 4, "Y": -0.756802, "Z": -0.653644, "R": 0.410429, "S": 33.189659 },
{"X": 5, "Y": -0.958924, "Z": 0.283662, "R": 0.756501, "S": 41.980234 },
{"X": 6, "Y": -0.279415, "Z": 0.96017, "R": 0.412779, "S": 0.837251 }
])
df_sb_multi = df_sb_multi.astype("float64")
fig_sb_multi, sb_multi_ax = plt.subplots()
sb_multi_ax.clear()
sns.scatterplot(df_sb_multi, x="X", y="R", size="S", ax=sb_multi_ax)
However, convert_dtypes() usage results in the error.
However, convert_dtypes() usage results in the error.
Right, convert_dtypes
produces pandas types by default. You could use convert_floating=False
too...
Ultimately the pandas dtypes are an ongoing annoyance for seaborn
I'm sorry that you have been experiencing this downstream, but I'm also not surprised. I have made a PDEP in pandas to align on a Logical Type System that I think could help, and would love any feedback on to improve this experience for the ecosystem:
https://github.com/pandas-dev/pandas/pull/58455
They’re still “experimental” in pandas and often cause issues in code written expecting numpy dtypes, which is most of matplotlib
This is another area I am hoping we can address via the PDEP process:
https://github.com/pandas-dev/pandas/pull/59125#discussion_r1657797729
These "experimental" types have existed since 2019 with little updates on the pandas side, so the experimental label is disingenuous at this point
Create a table that has dtypes Float64 and use one of the columns for the
size
parameter inscatterplot
:Running that code produces the following error:
If you omit the
size
parameter or explicitly convert the types to float32, it works, e.g.:Does seaborn not support Float64 type?