has2k1 / plotnine

A Grammar of Graphics for Python
https://plotnine.org
MIT License
3.92k stars 210 forks source link

scale_x_discrete fails when using {number:str} dictionary in labels parameter #708

Closed juanramonua closed 10 months ago

juanramonua commented 11 months ago

In the following example plotnine 0.12.2 shows the error:

AttributeError: 'RangeDiscrete' object has no attribute 'range'


df = pd.DataFrame({
    'order':[1,2,3,4],
    'value':[10,20,30,40],
    'label':['one','two','three','four']
})

(
  ggplot(df,aes(x='order',y='value'))
  + geom_bar(position = 'dodge', stat='identity')
  + scale_x_discrete( labels = {1:'one',2:'two',3:'three',4:'four'} )
)

While testing I noticed that if df['order'] is of type str (instead of numeric) and the corresponding dictionary keys in labels ({'1':'one','2':'two','3':'three','4':'four'}) it works correctly. Why does it fail in the case of numeric indexes?

Best regards and thanks.

TyberiusPrime commented 11 months ago

It's expecting the scale to be continuous when you pass in numbers.

Personally, I'd replace the column with a categorical, which also allows you to control the order in the plot (necessary here because otherwise it's alphabetical in your labels).

df = df.assign(
    order = pd.Categorical(
   df.order.replace({1:'one',2:'two',3:'three',4:'four'}),
   ['one','two','three','four'])
)
has2k1 commented 11 months ago

x is a continuous variable and needs a continuous scale.

You have two options.

1.

(
  ...
  + scale_x_continuous( labels = {1:'one',2:'two',3:'three',4:'four'} )
)

2.

(
  ggplot(df,aes(x='factor(order)',y='value'))
  ...
)

I'll look into a better error message.

juanramonua commented 10 months ago

Sorry for not answering sooner. The problem where I apply it is more complex and I just wanted to leave an example as simple as possible.

Actually, I use facet_grid() and I need the same labels (error results and algorithm name) to appear in different order depending on the grid. I achieve this by ordering all the results globally with numbers from 0 to N and their corresponding algorithm names that can be repeated but correspond to different experiments. If the example that works perfectly in the version 0.10.1 but in the later ones it fails.

If the ordering is done with type str it does work, i.e. order=['1','2','3','4'] and the corresponding dictionary to change them with their respective names, it works correctly.

I don't understand why it fails when using an integer as the order, which would be the most logical thing to do as in previous versions.

On the other hand, if I use scale_x_continuous() numeric labels such as 1.5, 2.5, etc. appear that I do not want to appear.



from plotnine import *
import pandas as pd

df = pd.DataFrame({
    'order':[1,2,3,4],
    'value':[10,20,30,40],
    'label':['Algorithm 1','Algorithm 2','Algorithm 2','Algorithm 2'],
    'group':[1,1,2,2]
})

(
  ggplot(df,aes(x='order',y='value'))
  + geom_bar(position = 'dodge', stat='identity')
  + facet_wrap('group')
  + scale_x_discrete( labels = {1:'Algorithm 1',2:'Algorithm 2',3:'Algorithm 2',4:'Algorithm 1'} )
)