has2k1 / plotnine

A Grammar of Graphics for Python
https://plotnine.org
MIT License
3.89k stars 209 forks source link

Bug after changing order of factors on facet_grid #780

Open bhvieira opened 2 months ago

bhvieira commented 2 months ago

There's a bug that's haunting me right now with the order of factors with facet_grid. I'm posting it here in case other people have seen it, but I couldn't reproduce it The MWE would be something like

import pandas as pd
import numpy as np
from plotnine import *

# random data with two continuous values and three categories
random_data = {"Continuous_1": np.random.normal(size=100), "Continuous_2": np.random.normal(size=100), "Category_1": np.random.choice(["A", "B", "C", "D", "E", "F", "G"], size=100), "Category_2": np.random.choice(["H", "I", "J"], size=100), "Category_3": np.random.choice(["Series_1", "Series_2"], size=100)}
random_data = pd.DataFrame.from_dict(random_data)
random_data["Category_1"] = pd.Categorical(random_data["Category_1"])
random_data["Category_2"] = pd.Categorical(random_data["Category_2"])
random_data["Category_3"] = pd.Categorical(random_data["Category_3"])

ggplot(random_data, aes(x="Continuous_1", y="Continuous_2", fill="Category_3")) + geom_point() + geom_line() + facet_grid("Category_1 ~ Category_2")
ggplot(random_data, aes(x="Continuous_1", y="Continuous_2", fill="Category_3")) + geom_point() + geom_line() + facet_grid("Category_2 ~ Category_1")

Notice the only thing changing between both is which factor maps to rows and which one maps to columns. If you run it in current plotnine, no error will occur. Problem is, with my own real data, if I do exactly the same, the second call mixes the data from the panels somehow, and you get data that should be in one panel appearing in another. I'm still investigating, so I'll update this issue if I can figure it out.

has2k1 commented 2 months ago

Just wondering, what version of pandas are you using? Can you try the lowest supported pandas pandas==2.1.0?