vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.23k stars 784 forks source link

Faceted map example chart #1711

Open palewire opened 4 years ago

palewire commented 4 years ago

Got this request from a colleague

mattijn commented 4 years ago

Try this:

import altair as alt
from vega_datasets import data

states = alt.topo_feature(data.us_10m.url, 'states')
source = data.income.url

alt.Chart(source).mark_geoshape().encode(
    shape=alt.Shape(field='geo', type='geojson'),
    color='pct:Q',
    column='group:N',
    tooltip=['name:N', 'group:N', 'pct:Q']
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(data=states, key='id'),
    as_='geo'
).properties(
    width=75,
    height=150
).project(
    type='albersUsa'
)
image

I noticed there is no shorthand for type='geojson' (otherwise you could do something as shape='geo:G'). It's also not mentioned in the Altair docs, where it is in the Vega-Lite docs

irisslee commented 4 years ago

Here's an example using the LA riots sample dataset


import altair as alt
from vega_datasets import data

df = data.la_riots()

n = alt.topo_feature('https://gist.githubusercontent.com/irisslee/70039051188dac8f64e14182b5a459a9/raw/2412c45551cff577f7b10604ca523bd3f4dd31d3/countytopo.json', 'county')

LAbasemap = alt.Chart(n).mark_geoshape(
    fill='lightgray',
    stroke='white'
).properties(width = 400, height =400).project('mercator')

points = alt.Chart().mark_circle().encode( 
    longitude = 'longitude:Q',
    latitude='latitude:Q',
    size = alt.value(15), 
    color = 'gender:N'
)

alt.layer(LAbasemap, points, data=df).facet('gender:N') 

visualization

jakevdp commented 4 years ago

That's a nice example of the mechanics of a faceted map, but I think for this particular dataset the visualization would be more effective without splitting it across facets.

palewire commented 4 years ago

What do you see as an ideal example of a faceted map for the gallery?

On Thu, Oct 3, 2019, 8:21 PM Jake Vanderplas notifications@github.com wrote:

That's a nice example of the mechanics of a faceted map, but I think for this particular dataset the visualization would be more effective without splitting it across facets.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/altair-viz/altair/issues/1711?email_source=notifications&email_token=AAACOCO2KWK6CPYLPD3LGCLQM2ZCZA5CNFSM4I3LKKB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAKHPKQ#issuecomment-538212266, or mute the thread https://github.com/notifications/unsubscribe-auth/AAACOCMC7QXEF63373YFC63QM2ZCZANCNFSM4I3LKKBQ .

jakevdp commented 4 years ago

I haven't been able to come up with a good example.

mattijn commented 4 years ago

I add one already in https://github.com/altair-viz/altair/pull/1714..

palewire commented 4 years ago

In news graphics, the most common case for a faceted map is when you want to create a set of "mini multiples" that compare quantitative values on a shared scaled across a set of competing nominative values.

A current example would be mapping the location of campaign donors across America for the 20+ Democratic presidential candidates.

If you want something in that ballpark, I think we should look for a sample 50 state dataset that has a nominative facet where the different categories show some variety across the country.

On Thu, Oct 3, 2019, 10:50 PM mattijn notifications@github.com wrote:

I add one already in #1714 https://github.com/altair-viz/altair/pull/1714..

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/altair-viz/altair/issues/1711?email_source=notifications&email_token=AAACOCLEEFH76J6TD7VR7EDQM3KQ7A5CNFSM4I3LKKB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAKQCVQ#issuecomment-538247510, or mute the thread https://github.com/notifications/unsubscribe-auth/AAACOCPVUJO7WS7U26KQJGDQM3KQ7ANCNFSM4I3LKKBQ .

mattijn commented 4 years ago

I like this one as well: https://www.ryansleeper.com/a-tale-of-50-cities-population-changes-of-the-50-largest-us-cities-since-1790/

image

palewire commented 4 years ago

I think facets by time series segment or by a quantitative bracket are interesting, but I'd wager that both are much less common than charts that facet by a nominative category.

mattijn commented 4 years ago

How does a facet by quantitative data look like? Albeit years can be a quantitative data type as well, aren't they used as nominative categories here?

import altair as alt
from vega_datasets import data

countries = alt.topo_feature(data.world_110m.url, 'countries')
source = 'https://raw.githubusercontent.com/mattijn/datasets/master/cities_prediction_population.csv'

base = alt.Chart(countries).mark_geoshape(
    fill='lightgray',
    stroke='white',
    strokeWidth=0.2
).properties(width=300, height=200).project('naturalEarth1')

cities = alt.Chart().mark_circle().encode( 
    latitude='lat:Q',    
    longitude='lon:Q',
    size=alt.Size('population:Q',scale=alt.Scale(range=[0, 1000]), legend=alt.Legend(title="Population (million)")),
    fill=alt.value('green'),
    stroke=alt.value('white'),
    tooltip=['city:N','population:Q']
)

alt.layer(base, cities, data=source).facet(
    facet='year:N', 
    columns=2, 
    title='The 20 Most Populous Cities in the World by 2100'
)

image

Based on https://www.visualcapitalist.com/animated-map-worlds-populous-cities-2100/

palewire commented 4 years ago

Perhaps I am not using the term nominative correctly, but in this example you give I would say you are still grouping an ordinal time series at the end of the day.

The result is an example that is slightly more complex, and less common, than one where the dataset already possesses a simple categorical column, like politician candidate in my earlier example, or like gender in the one given by Iris Lee.

On Fri, Oct 4, 2019, 8:32 AM mattijn notifications@github.com wrote:

How does a facet by quantitative data look like? Albeit years can be a quantitative data type as well, aren't they used as nominative categories here?

import altair as altfrom vega_datasets import data

countries = alt.topo_feature(data.world_110m.url, 'countries') source = 'https://raw.githubusercontent.com/mattijn/datasets/master/cities_prediction_population.csv'

base = alt.Chart(countries).mark_geoshape( fill='lightgray', stroke='white', strokeWidth=0.2 ).properties(width=300, height=200).project('naturalEarth1')

cities = alt.Chart().mark_circle().encode( latitude='lat:Q', longitude='lon:Q', size=alt.Size('population:Q',scale=alt.Scale(range=[0, 1000]), legend=alt.Legend(title="Population (million)")), fill=alt.value('green'), stroke=alt.value('white'), tooltip=['city:N','population:Q'] )

alt.layer(base, cities, data=source).facet( facet='year:N', columns=2, title='The 20 Most Populous Cities in the World by 2100' )

[image: image] https://user-images.githubusercontent.com/5186265/66219935-5bd65200-e6cc-11e9-9314-e858a74efd4a.png

Based on https://www.visualcapitalist.com/animated-map-worlds-populous-cities-2100/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/altair-viz/altair/issues/1711?email_source=notifications&email_token=AAACOCP4NGLLX4QAB43PFBDQM5OYRA5CNFSM4I3LKKB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAMA7VA#issuecomment-538447828, or mute the thread https://github.com/notifications/unsubscribe-auth/AAACOCONVNIMW5ET2KYKF6LQM5OYRANCNFSM4I3LKKBQ .

mattijn commented 4 years ago

Yeah, my example is more ordinal then nominal

palewire commented 4 years ago

In my opinion, the best Altair examples import from vega_datasets and do not require any transformation of data prior to plotting.

With those requirements, I'm not sure there's a suitable dataset in the current example list other than the LA riots dataset used by @irisslee. However, that set may require the import of outside geographies for the base map, something I think we should also aim to avoid.

Unless we can find a good candidate with the examples, or solve the issue of the base map for the riots data, I think we should consider nominating a new example dataset for vega_datasets to document this relatively common news chart.