vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.36k stars 793 forks source link

Irregularity between line and area plot when coloring different sections for a discrete time series data #2281

Open armsp opened 4 years ago

armsp commented 4 years ago

I wanted to show the recent time in the data using a different color for an area chart with a dark line on top. Since line and area marks are considered single entities I used transform_calculate to essentially group the data into 2 groups.

Using the stocks data I have the following observations -

import altair as alt
from vega_datasets import data

source = data.stocks()

alt.Chart(source, width=600).mark_area(interpolate='step-after',line=True, fillOpacity=0.6).transform_filter((alt.datum.symbol != 'MSFT') & (alt.datum.date > alt.expr.toDate('2005-01-01'))).transform_calculate(recent = alt.datum.date > alt.expr.toDate('2009-01-01')).encode(
    x='date',
    y='price',
    color='recent:N',
    facet = alt.Facet('symbol', columns=2)
)

This gives the following - nooo

However since the stroke lines are not exactly what I wanted (orange strokes in blue region and vice versa - although explainable), I decided to layer line on area as follows -

area = alt.Chart(source, width=600).mark_area(interpolate='step-after', fillOpacity=0.6).transform_filter((alt.datum.symbol != 'MSFT') & (alt.datum.date > alt.expr.toDate('2005-01-01'))).transform_calculate(recent = alt.datum.date > alt.expr.toDate('2009-01-01')).encode(
    x='date',
    y='price',
    color='recent:N'
)

line = area.mark_line(interpolate='step-after').encode(
    color='recent:N',
)

(area+line).facet(facet = alt.Facet('symbol')).configure_facet(columns=2)

This gives the following - gap This is more like what I want, but if you look closely you will notice that at the color transition the stroke (line chart) has a gap.

It is actually easy to explain that - recent actually starts from 2005-02-01 and the time between 2005-01-01 to 2005-02-01 does not have the line segment since there is no data and also that the two groups are separate.

I would expect the same behavior from area chart too, but its area seems to extend beyond.

I mean, ideally I want the line to behave the same way so that there are no discontinuities. Is there a way to make that happen? Also having an understanding of why area behaves this way would be helpful too.

jakevdp commented 4 years ago

I don't think there is any easy way to deal with this, unfortunately. I've seen questions related to step interpolation edges come up in the Vega-Lite forums, and never seen an answer that avoids adding edge-points to the underlying data. Please let us know if you figure out a good way around it.