geom_col drops fill groups when using position_dodge

I've been scratching my head quite a bit with this one. I've finally achieved to make a reproducible minimal example.

The context is the following:

I am using plotnine 0.12.1 here. I have a discrete variable in the x-axis, and a continuous variable in the y-axis. I have a second discrete variable that I use for the fill aesthetic, and a third discrete variable that I use as the group aesthetic. In some cases, for a single value of x I have multiple values of this third variable, yet I am interested in the total height of the individual columns so I use position='dodge'.

The problem is then that the plot generated seems to completely throw away several of the values of the fill aesthetic variable, reproducing a completely erroneous plot. Here the example:

alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
# I tested several groups/samples combinations but it always seems to be wrong
n_groups = 5
n_samples = 5
df = (
    pd.DataFrame(
        dict(
    taxa=np.tile(list(alphabet[:n_groups]), n_samples),
    sample_id=np.repeat(['S01', 'S02', 'S03', 'S04', 'S05'], n_groups),
    counts=np.random.randint(0, 100, n_groups * n_samples),
    date=np.repeat(['02/17', '02/24', '03/03', '03/03', '03/10'], n_groups)
        )))
df

taxa	sample_id	counts	date
A	S01	41	02/17
B	S01	88	02/17
C	S01	43	02/17
D	S01	30	02/17
E	S01	54	02/17
A	S02	26	02/24
B	S02	19	02/24
C	S02	99	02/24
D	S02	74	02/24
E	S02	83	02/24
A	S03	0	03/03
B	S03	56	03/03
C	S03	34	03/03
D	S03	19	03/03
E	S03	67	03/03
A	S04	92	03/03
B	S04	64	03/03
C	S04	83	03/03
D	S04	86	03/03
E	S04	13	03/03
A	S05	73	03/10
B	S05	27	03/10
C	S05	87	03/10
D	S05	73	03/10
E	S05	14	03/10

If we were to now plot this data with the default position, the representation seems correct:

(p9.ggplot(dd)
      + p9.aes('date', 'counts', group='sample_id', fill='taxa')
      + p9.geom_col()
 )

However, since we want to have the total height of each individual sample, we use position='dodge' and the values are completely off:

(p9.ggplot(df)
      + p9.aes('date', 'counts', group='sample_id', fill='taxa')
      + p9.geom_col(position='dodge')
 )

The issue doesn't simply arise because of the differing width of the columns, as if we keep a one to one ratio between the x axis variable and the group variable but use the 'dodge' position, the representation is also wrong:

(p9.ggplot(df.query('sample_id!="S04"'))
      + p9.aes('date', 'counts', group='sample_id', fill='taxa')
      + p9.geom_col(position='dodge')
      )

Any ideas on what might be causing this? I will take a look later at the actual code in position_dodge.py but as of now I am quite clueless...

You are doing a rather complicated dodge that overwhelms position_dodge. It does not throwaway any values, rather it just doesn't do enough dodging so the taller columns may overlap the shorter ones depending on the order. You can add
```
geom_text(aes(label="counts"), position=position_dodge(width=0.9))
```
to see that everything is there.
When you have more than a single group at the same x location, the width of the bars is being split between the groups. This increases the appearance of position_dodge being confused!

The solution is to use the more capable position_dodge2 and tell it (preserve="single") to preserve the width of a single element in a group.

geom_col(position=position_dodge2(preserve="single"))

has2k1 / plotnine

geom_col drops fill groups when using position_dodge #743