has2k1 / plotnine

A Grammar of Graphics for Python
https://plotnine.org
MIT License
4.01k stars 217 forks source link

Labels offset from mizani.breaks.breaks_date output. #880

Open chrisjcameron opened 4 days ago

chrisjcameron commented 4 days ago

In this plot, both the x data values and label breakpoints are generated by a mizani.breaks.breaks_date object applied to the same range. While they should be the same values, the labels are 4 hours offset from the data points. E.g. The data x values are at 0, 6, 12, 18 while the labels are at 4, 10, 16, 22. All the values are specified as UTC, so I think this is unexpected output.

This is with mizani-0.11.4, plotnine-0.13.6. I understand a new release is imminent and would be happy to test again post-release.

Thanks for taking a look. I really appreciate the work you have put into plotnine.

These are the values returned by breaker below:

0   2022-12-19 00:00:00+00:00
1   2022-12-19 06:00:00+00:00
2   2022-12-19 12:00:00+00:00
3   2022-12-19 18:00:00+00:00
4   2022-12-20 00:00:00+00:00

This code will reproduce the attached plot. Note that data points and labels locations are not lined up.

limits=[
    datetime.datetime(2022, 10, 19, 0, 0, 0, tzinfo=datetime.timezone.utc),
    datetime.datetime(2022, 10, 20, 0, 0, 0, tzinfo=datetime.timezone.utc)
]
breaker = mizani.breaks.breaks_date('6 hours')
x = breaker(limits)
y = range(len(x))

p_df = pd.DataFrame({'x':x, 'y':y})

plot = (    
    gg.ggplot(p_df, gg.aes(x='x' , y='y'))
    + gg.geom_point()
    + gg.scale_x_datetime(
        date_breaks='6 hours', 
        date_labels='%m-%d %H:%M', 
        limits=limits,
    )
    + gg.theme(axis_text_x=gg.element_text(angle=30, hjust=1))
)
plot.show()

image

has2k1 commented 3 days ago

By default plotnine generates breaks for an expanded coordinate system. That means the real limits are wider than those passed in the scale. That is how you get space on either end of the data limits!

You can turn off the expansion in one of two ways.

  1. Using the scale.
    + scale_x_datetime(
        date_breaks='6 hours', 
        date_labels='%m-%d %H:%M', 
        limits=limits,
        expand=(0, 0)  # Effectively turns off the expansion
    )
  1. Through the coordinates
    + coord_cartesian(expand=False)
chrisjcameron commented 3 days ago

Interesting. Is there a way to specify where the labels will appear? The default expansion looks good but what if I want to align my breaks with the start of each day without adding a +/- date_breaks to the limits of the graph?

has2k1 commented 2 days ago

but what if I want to align my breaks with the start of each day

Start of the day are good locations, so you can ask for as many breaks as there are days in your range.

Generally, you can try passing the exact number of breaks that you want. e.g.

    + scale_x_datetime(
        date_breaks=5, 
        date_labels='%m-%d %H:%M', 
        limits=limits,
    )

It tells the algorithm to generate maximum n breaks. If the n can be placed at "good" locations, they will be generated. Otherwise you get fewer breaks at "good" locations. It won't always work, but it tries to be smarter at where to place the breaks than when you specify the width.

If that doesn't generalise well and you always know your limits and width, you can write a small function to generate the breaks for you

def my_breaks(start, n=10, **td_kwargs):
    return [start + i*datetime.timedelta(**td_kwargs) for i in range(n)]

(    
    ...
    + scale_x_datetime(
        breaks=my_breaks(limits[0], hours=6),
        date_labels='%m-%d %H:%M', 
    )

)