holoviz / holoviews

With Holoviews, your data visualizes itself.
https://holoviews.org
BSD 3-Clause "New" or "Revised" License
2.71k stars 404 forks source link

HoloMap of BoxWhisker Categorical Axis not respecting missing categorical data #3508

Open CRiddler opened 5 years ago

CRiddler commented 5 years ago

Hello, I've come across what I believe is a bug in transforming a hv.Dataset to a hv.BoxWhisker plot. In a HoloMap, when I plot nested data (categorical) to a BoxWhisker if one of the category levels is missing, the missing level is always placed on the right end of the plot. This issue strangely only occurs when I plot a BoxWhisker, if I aggregate my data so that I can plot it to Bars the missing category level appears in the proper place along the x-axis. I am running holoviews version 1.11.2

I've supplied a copy/paste example (can be run in a notebook). Note that in the data, I have one dataset that has all of the nested levels, and in the other, I have purposefully dropped out all the values pertaining to the level ["C", "3"].

import numpy as np; np.random.seed(1)
import pandas as pd
import holoviews as hv
hv.extension('bokeh')

data = {'letter': ['A', 'B', 'C'] * 80,
        'number': [1, 2, 3, 4] * 60,
        'value': np.random.normal(size=240)}
df = pd.DataFrame(data)
df = df.assign(letter=lambda df: pd.Categorical(df.letter),
               number=lambda df: pd.Categorical(df.number))
missing_c3_df = df.loc[~((df['letter'] == 'C') & (df['number'] == 3))]

# Data for BoxWhiskers
full_ds = hv.Dataset(df, ['letter', 'number'], 'value').sort()
missing_c3_ds = hv.Dataset(missing_c3_df, ['letter', 'number'], 'value').sort()

# Aggregate data for Barplots
full_ds_agg = full_ds.aggregate(['number', 'letter',], 'mean')
missing_c3_ds_agg = missing_c3_ds.aggregate(['letter', 'number'], 'mean')

(
    full_ds.to.box('number').grid() + missing_c3_ds.to.box('number').grid() +
    full_ds_agg.to.bars('number').grid() + missing_c3_ds_agg.to.bars('number').grid()
).cols(2)

wrong_order_categorical_axis

The plots in the left column reflect the full_data plots, whereas the right column comes from the data that is missing level "C3". In the BoxWhisker plot of the right column, the missing value is all the way on the right. A closer look at the xticks reveals that they've been sorted to [1, 2, 4, 3]. Whereas in the Bars plot in the right column, the missing "C3" bar is properly accounted for in the correct position. No matter what I've tried (I'm not super experienced with holoviews so take this with a grain of salt), I can not reorder the x-axis of the BoxWhisker to keep the missing BoxWhisker in the correct place.

philippjfr commented 5 years ago

The immediate bug here is that despite using a categorical axis it should sort the values if they are numeric by default. The secondary and more fundamental improvement, which I think there is another issue for, is to allow defining categorical order via Dimension.values.