vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.35k stars 794 forks source link

Getting legend for multilayer chart #984

Open ajasja opened 6 years ago

ajasja commented 6 years ago

Is it possible to make a legend for such a chart

import altair as alt
import numpy as np
import pandas as pd

x = np.arange(100)
data = pd.DataFrame({'x': x,
                     'sin(x)': np.sin(x / 5),
                     'data': np.sin(x / 5) + 0.3*np.random.rand(100)})

line = alt.Chart(data).mark_line(strokeWidth=6, color='orange').encode(
    x='x',
    y='sin(x)'
)

point = alt.Chart(data).mark_point(color='black').encode(
    x='x',
    y='data'
)
(line + point)

image

ajasja commented 6 years ago

I would like to get something like this image

jakevdp commented 6 years ago

Legends are only created if the data within a layer is somehow grouped by a label. You can force this by adding columns of labels; for example:

x = np.arange(100)
data = pd.DataFrame({'x': x,
                     'sin(x)': np.sin(x / 5),
                     'data': np.sin(x / 5) + 0.3*np.random.rand(100),
                     'line_label': 100 * ['line'],
                     'points_label': 100 * ['points']})

line = alt.Chart(data).mark_line(strokeWidth=6, color='orange').encode(
    x='x',
    y='sin(x)',
    opacity='line_label'
)

point = alt.Chart(data).mark_point(color='black').encode(
    x='x',
    y='data',
    shape='points_label'
)
(line + point)

visualization 25

though this is admittedly a bit hacky. Also, as far as I know vega is incapable of displaying line marks within a legend as you show in your example above, though @kanitw or @domoritz may be able to correct me on that.

domoritz commented 6 years ago

I think we can use a custom SVG path as a symbol but haven't gotten around to make it the default for Vega-Lite lines yet.

ajasja commented 6 years ago

Thanks! This is indeed a bit hacky:) But I got even a bit closer.

x = np.arange(100)
data = pd.DataFrame({'x': x,
                     'sin(x)': np.sin(x / 5),
                     'data': np.sin(x / 5) + 0.3*np.random.rand(100),
                     'line_label': 100 * ['line'],
                     'points_label': 100 * ['points']})

line = alt.Chart(data).mark_line(strokeWidth=6, color='orange').encode(
    x='x',
    y='sin(x)',
    opacity=alt.Opacity('line_label', legend=alt.Legend(title=""))
)

point = alt.Chart(data).mark_point(color='black').encode(
    x='x',
    y='data',
    shape=alt.Shape('points_label', legend=alt.Legend(title=""))
)
(line + point)

PS: this is probably a different debate (e.g. https://github.com/altair-viz/altair/issues/947); I just found out I could do

shape=alt.Shape('points_label', title="" )

instead of

shape=alt.Shape('points_label', legend=alt.Legend(title="") )

Big kudos points! :+1: Is this documented somewhere or is it a more try and see?

jakevdp commented 6 years ago

The title keyword is documented in the API docs (e.g. alt.X and used in a few examples, but I think it would be useful to have a section of the documentation dedicated to titles and labels.

Any volunteers? :smile:

pletka commented 5 years ago

Looks great! How can you sort the labels in the legend?

domoritz commented 5 years ago

If you don't need to use different mark types for the layers, you can also use the fold transform documented at https://vega.github.io/vega-lite/docs/fold.html to convert you data to long/tidy form.

jetilton commented 4 years ago

Hi folks, I am new to altair and trying to plot weather/hydrograph data. I am able to plot the data, but I can't seem to specify the colors I want with a legend. My data and plot looks like the below.

import numpy as np
import altair as alt
x = np.arange(100)
sin = np.sin(x / 5)
data = pd.DataFrame({'x': x,
                     'sin(x)': np.sin(x / 5),
                     'q_95': sin + 100*np.random.rand(100),
                     'q_75': sin + 75*np.random.rand(100),
                     'q_50': sin + 50*np.random.rand(100),
                     'q_25': sin + 25*np.random.rand(100),
                     'q_05': sin + 5*np.random.rand(100),
                    })
perc_90 = alt.Chart(data).mark_area(color='#4292c6', opacity = .5,).encode(
    x=alt.X('x',axis=alt.Axis(title='Day')),
    y=alt.Y('q_05',axis=alt.Axis(title='cfs')),
    y2 = 'q_95',
    #fill=alt.Color("p90", legend=alt.Legend(title=''))

).properties(
    width=800)

perc_50 = alt.Chart(data).mark_area(color='#08519c', opacity = .5).encode(
    x=alt.X('x',axis=alt.Axis(title='Day')),
    y=alt.Y('q_25',axis=alt.Axis(title='cfs')),
    y2 = 'q_75',
    #color=alt.Color("p50", legend=alt.Legend(title=''))

)

median = alt.Chart(data).mark_line(color = '#08306b').encode(
    x='x',
    y='q_50',
    #opacity=alt.Color("median", legend=alt.Legend(title=''))
)

perc_90 + perc_50 + median

Out of curiosity is there a reason why altair does not allow for custom legends? Thanks, I really love the work so far.

domoritz commented 4 years ago

@jetilton Vega-Lite supports custom legends (and so does Altair). You may need to modify the scale domain and range as in https://vega.github.io/vega-lite/examples/stacked_bar_weather.html. If you have a smaller example, I can give more feedback.

footfalcon commented 4 years ago

@domoritz Would you mind taking a look at this small example? I am aiming to use a selector to toggle different layered time-series but can't figure out how to generate a proper legend. This example takes the stock price dataset and I added a dummy 'Price-Earnings' ratio to layer onto the plot, and then use another single axis plot to dashboard-toggle which stock to display. The legend I want to display should identify the 'Price' and 'PE' series instead of the symbols. I understand that the data probably has to be rearranged somehow, and it may not be practical/possible, in which case, is there a way to manually create/label a legend/textbox for this use case? Thanks in advance!

    #* Testing to get legend....
    stockdata = data.stocks()
    stockdata['pe'] = stockdata['price'] / 10

    selector = alt.selection_single(
        fields=['symbol'], 
        empty='all',
        init={'symbol': 'AAPL'}
    )

    legend = alt.Chart(stockdata).mark_square(size=150).encode(
        y=alt.Y(
            'symbol:N',
            axis=alt.Axis(domain=False, ticks=False, orient='right'), title=None
        ),
        color=alt.condition(selector, 'symbol:N', alt.value('gainsboro'), legend=None)
    ).add_selection(
        selector
    )

    price = alt.Chart(stockdata).mark_line(point=True).encode(
        x='date:T',
        y='price:Q',
        color='symbol:N',
        #size='pe:Q'
    )

    pe = alt.Chart(stockdata).mark_bar().encode(
        x='date:T',
        y='pe:Q',
        color='symbol:N'
    )

    legend | (price + pe).add_selection(
                            selector
                        ).transform_filter(
                            selector
                        )
image
jakevdp commented 4 years ago

For one thing, it's now possible to make native legends interactive:

import altair as alt
from vega_datasets import data

stockdata = data.stocks()
stockdata['pe'] = stockdata['price'] / 10

selector = alt.selection_single(
    fields=['symbol'], 
    empty='all',
    init={'symbol': 'AAPL'},
    bind='legend'
)

price = alt.Chart(stockdata).mark_line(point=True).encode(
    x='date:T',
    y='price:Q',
    color='symbol:N',
    opacity=alt.condition(selector, alt.value(1), alt.value(0))
).add_selection(
    selector
)

pe = alt.Chart(stockdata).mark_bar().encode(
    x='date:T',
    y='pe:Q',
    color='symbol:N'
).transform_filter(
    selector
)

price + pe

visualization - 2020-02-27T054454 350

Beyond that, it's not clear to me how you want your legend to be different than what is shown. Both layers have a shared color encoding that is correctly reflected in the legend.

footfalcon commented 4 years ago

Hi Jake - thanks for your reply. I'm am aware of native legend interactivity (which is great). I should have probably mentioned more that I am new to Altair and exploring its possibilities. In this case, I am trying to see where I can take it as a mini-dashboard. The reason I want to try using the selector the way I have it is that, in my use-case:

  1. it is a long list of countries (which would get truncated as a native legend), and
  2. I want it to control several more separate plots (that would all be filtered by country selection, and
  3. I want the flexibility to control the layout.

Also, while your plot is effectively the same as mine, and the native legend does identify the stock correctly by color, it does not clearly show which series is the stock price and which is the stock PE. What I am hoping to do by creating the pseudo-legend is keep that stock identity, but also be able to display a legend which says the mark_line series is the price, and the mark_bar series is the PE.

This may not be possible, in which case, is it possible to create something like a text box to manually place on chart? I will actually be using a different color for the line and bars (which will remain constant for each stock,eg: price == red; PE == gray), so I could color code the labels in a text box to convey that information.

    stockdata = data.stocks()
    stockdata['pe'] = stockdata['price'] / 10

    selector = alt.selection_single(
        fields=['symbol'], 
        empty='all',
        init={'symbol': 'AAPL'}
    )

    legend = alt.Chart(stockdata).mark_square(size=150).encode(
        y=alt.Y(
            'symbol:N',
            axis=alt.Axis(domain=False, ticks=False, orient='right'), title=None
        ),
        color=alt.condition(selector, alt.value('firebrick'), alt.value('gainsboro'), legend=None)
    ).add_selection(
        selector
    )

    price = alt.Chart(stockdata).mark_line(point=True).encode(
        x='date:T',
        y='price:Q',
        color=alt.value('firebrick'),
        #size='pe:Q'
    )

    pe = alt.Chart(stockdata).mark_bar().encode(
        x='date:T',
        y='pe:Q',
        color=alt.value('gray')
    )

    legend | (price + pe).add_selection(
                            selector
                        ).transform_filter(
                            selector
                        )
image

Here's a very-work-in-progress snapshot of what I am trying to do...

image
domoritz commented 4 years ago

Unfortunately, I don't have time right now to look at anything but minimal examples that demonstrate a specific issue.

footfalcon commented 4 years ago

@domoritz No problem, I will keep exploring. Am really impressed with Altair!

jakevdp commented 4 years ago

You could do something like this:

import altair as alt
from vega_datasets import data

stockdata = data.stocks()
stockdata['pe'] = stockdata['price'] / 10

selector = alt.selection_single(
    fields=['symbol'], 
    empty='all',
    init={'symbol': 'AAPL'},
    bind='legend'
)

price = alt.Chart(stockdata).mark_line(point=True).encode(
    x='date:T',
    y='price:Q',
    color='symbol:N',
    opacity=alt.condition(selector, alt.value(1), alt.value(0))
).add_selection(
    selector
)

pe = alt.Chart(stockdata).transform_calculate(
    name='"PE Ratio"'  
).mark_bar().encode(
    x='date:T',
    y='pe:Q',
    color=alt.Color('name:N', scale=alt.Scale(scheme='greys'), legend=alt.Legend(title=None))
).transform_filter(
    selector
)

(price + pe).resolve_scale(color='independent')

visualization (63)

The grammar offers a lot of possibilities for customizing legends and scales, depending on exactly what you want to do.

footfalcon commented 4 years ago

Thanks, I will give it a try...

RobbyJS commented 4 years ago

Hello, Since it has been some time since this question was asked, I wanted to see if there is any updates: is there a way of doing this (adding a legend when there is only one group of data inside the graph) that doesn't involve adding a column to the data and adding an additional property to the graph?

Is it possible to make a legend for such a chart

import altair as alt
import numpy as np
import pandas as pd

x = np.arange(100)
data = pd.DataFrame({'x': x,
                     'sin(x)': np.sin(x / 5),
                     'data': np.sin(x / 5) + 0.3*np.random.rand(100)})

line = alt.Chart(data).mark_line(strokeWidth=6, color='orange').encode(
    x='x',
    y='sin(x)'
)

point = alt.Chart(data).mark_point(color='black').encode(
    x='x',
    y='data'
)
(line + point)

image

jakevdp commented 4 years ago

No, there is still no way to add a legend without specifying an encoding that the legend will represent.

essafik commented 4 years ago

Are there any plans to have legends not based on a label? While in many cases it is easy to add a column, it not always practical as when you have simulation data involving many parameters and wanting to compare results from different simulations on the same plot.

jakevdp commented 4 years ago

What do you mean by “legend not based on a label”? How do you imagine specifying what the legend will contain?

essafik commented 4 years ago

Before switching to Altair, I was doing plots with Mathematica and you can simply specify your legend withing the plot by using the option PlotLegend ->{"line"}. But even with matplotlib you can specify the legend with the label option as in : plot(x, y, label="line"). Is something like that planned for Altair or even possible with vega-lite?

jakevdp commented 4 years ago

Yes, in newer versions of vega-lite you can set encodings to a constant datum value, which will be used to populate the legend. Altair doesn't yet support this, though.

In Altair it would probably look something like this (Note that this does not work in the current release):

alt.Chart(data).mark_line().encode(
  x='x',
  y='y',
  color=alt.datum("My Line")
)
gustavz commented 4 years ago

when will this feature be available? As for my understanding of a plotting library its crucial.

jakevdp commented 4 years ago

when will this feature be available?

What specifically are you asking about?

dsandber commented 4 years ago

What specifically are you asking about?

I believe @gustavz was asking about the ability to do "color=alt.datum("My Line")". Either way, I'd like to know also!

Also, once that functionality is supported, how can the color (like "orange") be specified as well?

jakevdp commented 4 years ago

You can currently specify a color like "orange" using color=alt.value("orange")

dsandber commented 4 years ago

@jakevdp yeah, the question is once the functionality described by @gustavz is implemented, so that a legend item can be specified by doing "color=alt.datum("My Line")", then how can the color also be specified since the "color" was set to "My Line".

jakevdp commented 4 years ago

You can define the color encoding's scale in the normal way; i.e. scale=alt.Scale(domain=["My Line"], range=["orange"])

NoName115 commented 3 years ago

I would like to add my solution to this issue, as I was struggling a lot to create a "custom legend" for my charts. My problem was that I had Chart(data).mark_line() and then a created transform_loess from that chart, where I wanted to show that one line contain exact measured values and the other is smoothed. I used an approach from https://github.com/altair-viz/altair/issues/2430. My result is below:

r

cjw296 commented 3 years ago

In Altair it would probably look something like this (Note that this does not work in the current release):

@jakevdp - what still needs to happen for this to work in a released version of Altair?

joelostblom commented 3 years ago

@cjw296 You can follow the discussion of updating Altair to the recent Vega-Lite versions in this issue https://github.com/altair-viz/altair/issues/2425

wangjiawen2013 commented 1 year ago

Hi, Sometimes when the groups are unbalanced, there are a lot more data in one group and only a few in another group. In this case, the small groups will be covered by the large group. For example, in the following figure, the blue points are obscured by the orange points. We want to highlight the blue point, but we cannot see it anymore, are there any ways to adjust the plot order ? image