holoviz / holoviews

With Holoviews, your data visualizes itself.
https://holoviews.org
BSD 3-Clause "New" or "Revised" License
2.66k stars 396 forks source link

Bars does not obey range set on kdim #4555

Open norweeg opened 3 years ago

norweeg commented 3 years ago

When plotting a bar chart, either directly or using a .to method, the plot does not obey the range of its kdim, even when other plot types do. In the code example below, the bar charts will show years 2003-2016 despite having their kdim range set to 2004-2015. The curve plots of the exact same data show the correct kdim range. As far as I can tell, only bar charts have this issue.

python=3.8.5 holoviews=1.13.3 jupyter=1.0.0 bokeh=2.1.1

example showing bug:

from datetime import datetime
import pandas as pd
import numpy as np
import holoviews as hv
from IPython.display import display

hv.notebook_extension("bokeh", inline=True, logo=True, width=100)

raw_data = (pd.read_csv("https://think.cs.vt.edu/corgis/datasets/csv/airlines/airlines.csv",
                                      parse_dates=["Time.Label"]
                                     )
                   .rename(columns={"Time.Label":"Report_Date"}) 
                   .rename(columns=lambda c: c.replace(".","_"))
                   .rename(columns=lambda c: re.sub("^Statistics_|^Time_","",c))
                   .query("Airport_Code == 'PHL'")
                  )[["Report_Date","Flights_Total"]]

report_date = hv.Dimension("Report_Date",
                           label="Report Date",
                           range=(datetime(2004,1,1),
                                       datetime(2015,12,1)) 
                          )

total_flights = hv.Dimension("Flights_Total", label="Total", unit="# Flights")

phl= hv.Dataset(raw_data, kdims=[report_date], vdims=[total_flights])

# Using Bars object
bars_1 = (hv.Bars(phl.transform(Report_Date=hv.dim("Report_Date").df.dt.year).aggregate(function=np.sum))
               .redim.range(Report_Date=(2004,2015))
               )
curve_1= (hv.Curve(phl.transform(Report_Date=hv.dim("Report_Date").df.dt.year).aggregate(function=np.sum))
                .redim.range(Report_Date=(2004,2015))
                )

display(bars_1.opts(title="Incorrect Date Range", xrotation=45) + curve_1.opts(title="Correct Date Range"))

# Using `to` methods
bars_2 = (phl.to.bars()
               .transform(Report_Date=hv.dim("Report_Date").df.dt.year)
               .aggregate(function=np.sum)
               .redim.range(Report_Date=(2004,2015))
              )

curve_2= (phl.to.curve()
                .transform(Report_Date=hv.dim("Report_Date").df.dt.year)
                .aggregate(function=np.sum)
                .redim.range(Report_Date=(2004,2015))
               )

display(bars_2.opts(title="Incorrect Date Range", xrotation=45) + curve_2.opts(title="Correct Date Range"))
philippjfr commented 3 years ago

So the problem here is that Bars is not a continuous plot, it plots a discrete set of categories, which means it never even looks at the Dimension.range instead using the discrete Dimension.values. This is something we're hoping to address but for now I can only recommend using a type that does support a continuous scale such as the Histogram type (badly named I know).

norweeg commented 3 years ago

I worked around it by rediming it to discrete values. I just expected that discrete values in a range would imply only values within that range. Took me a minute to figure that out though. Do the docs have anything about this behavior? I get that the bars are not a continuous plot, but my dim was originally continuous before I aggregated into years, and I just assumed that the bins within the range would be displayed