spotify / chartify

Python library that makes it easy for data scientists to create charts.
Apache License 2.0
3.52k stars 324 forks source link

Feature Request: Allow extra columns in data source for tooltips #61

Open valerierose opened 5 years ago

valerierose commented 5 years ago

type: feature

I have a use case where I would like to create a scatter plot that is color coded by a categorical variable, and include some other text in the tooltip. Currently, I am unable to do this with Chartify because the data frame I pass in has extra columns stripped from it, so they are not available to the tooltip. If you could either not strip those columns out, or add a parameter to the scatter plot function that would allow me to keep a text column, such as label_column it would be very helpful.

Here's an example to illustrate the issue:

I have a dataframe, df, that looks like:

x y l t
1 4 a 'Sunflower'
2 5 b 'Shallow'
3 6 c 'thank you, next'

I would like t column to appear in a tooltip in a scatter plot with the following code:

import pandas as pd
import chartify
from bokeh.models import HoverTool

df = pd.DataFrame({'x': [1, 2, 3], 
    'y': [4, 5, 6], 
    'l': ['a', 'b', 'c'], 
    't': ['Sunflower', 'Shallow', 'thank you, next']})

ch = chartify.Chart(blank_labels=True)
ch.plot.scatter(
        data_frame=df,
        x_column='x',
        y_column='y',
        color_column='l')
hover = HoverTool(tooltips=[
    ("Title", '@t'),
    ("Cluster", "@l"),
])
ch.figure.add_tools(hover)
ch.show()

But what happens is that the tooltip shows "Title: ???, Cluster: a". As far as I can tell, there is no way to pass in a list of text to HoverTool -- it has to be a part of the ColumnDataSource.

It looks like the internals of PlotNumericXY.scatter is filtering out columns here. Is this necessary? Can it be made optional?

cphalpert commented 5 years ago

Thanks for adding the issue! Agree that it would be good to support this use case and eventually support interactive charts in Chartify.

Currently columns that aren't used for plotting are dropped because of a bug that was found where columns added to a ColumnDataSource that contained json blobs could cause the plot to fail, even if they weren't being plotted. Dropping the columns was a quick fix.