santosjorge / cufflinks

Productivity Tools for Plotly + Pandas
MIT License
3.02k stars 675 forks source link

Categories support for box plots #67

Open santosjorge opened 7 years ago

santosjorge commented 7 years ago

Support:

df.iplot(kind='box', x='myX',y='myY',categories='myGroup')

MrDataPsycho commented 6 years ago

I was looking for that solution too, but could not find any help. But I was able to find a hack; For example in case of popular titanic data set from kaggle. Box plot of age by passenger class:

    import cufflinks as cf
    cf.go_offline()

    box_age = train[['Pclass', 'Age']]
    box_age.pivot(columns='Pclass', values='Age').iplot(kind='box')

You can do it in one step but in two (or three step by storing the pivot table in a object) step code looks clean. So Second step I am pivoting the data. So there will be 1 non-null value per rows. iplot can take care about the null values. I have tested with seaborn and iplot the give me the same answer. So its reliable. In case if you want to try both. Here is seaborn code:

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    %matplotlib inline

    plt.figure(figsize=(12, 7))
    sns.boxplot(x='Pclass', y='Age', data=train, palette='winter')

Note: I am using Jupyter Notebook that's why there is %matplotlib inline