vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.34k stars 794 forks source link

mark_geoshape encoding by sum of filed using color returns blank chart #2052

Open informatica92 opened 4 years ago

informatica92 commented 4 years ago

Hi all, I am having some problems trying to encoding a mark_geoshape chart using color representing the sum of a field inside a geopandas dataframe. The structure of the dataframe is: image where:

In particular, if I try to represent the chart using only the 'totale_casi', the chart "works" showing me the last value for that field

alt.Chart(gdf_merged_tmp).mark_geoshape(
    stroke='black',
    strokeWidth=0.5
).encode(
    color=alt.Color('totale_casi')
).properties(
    width=400,
    height=500
)

and the result is: image

now... if I try to obtain the same result encoding by sum of nuovi_totale_casi, the result is:

alt.Chart(gdf_merged_tmp).mark_geoshape(
    stroke='black',
    strokeWidth=0.5
).encode(
    color=alt.Color('sum(nuovi_totale_casi):Q')
).properties(
    width=400,
    height=500
)

image it seems that it's able to calculate the sum (because the result showed inside the legend is correct) but the chart is empty.

How can I solve this?

mattijn commented 4 years ago

I can have a look if you have a minimum working example including the data.

informatica92 commented 4 years ago

gdf_merged_tmp.zip

Hi mattijn, I created a zip file for you with all the data you need to replicate the issue I am referring to. Just unzip the file and execute the following code:

import geopandas as gpd
import altair as alt

gdf_merged_tmp = gpd.read_file('gdf_merged_tmp.shp')
print(type(gdf_merged_tmp))
print(gdf_merged_tmp)

alt.Chart(gdf_merged_tmp).mark_geoshape(
    stroke='black',
    strokeWidth=0.5
).encode(
    color=alt.Color('sum(nuovi):Q')
).properties(
    width=400,
    height=500
) # the blank chart

alt.Chart(gdf_merged_tmp).mark_geoshape(
    stroke='black',
    strokeWidth=0.5
).encode(
    color=alt.Color('totale')
).properties(
    width=400,
    height=500
) # the chart that works
mattijn commented 4 years ago

You have to include a groupby on both type and geometry in your aggregation as this is required to visualize GeoJSON features in Vega/Vega-Lite/Altair.

I don't think this is possible using a shorthand aggregation encoding, so you will need a transform_aggregate function.

Complete example:

import geopandas as gpd
import altair as alt
from shapely.ops import orient # version >=1.7a2

gdf = gpd.read_file(r"gdf_merged_tmp.shp")
gdf.geometry = gdf.geometry.simplify(0.1)

# just to be sure: apply left-hand-rule as winding order
# see https://altair-viz.github.io/user_guide/data.html#winding-order
gdf.geometry = gdf.geometry.apply(orient, args=(-1,))

# visz
alt.Chart(gdf).mark_geoshape(
    stroke='black',
    strokeWidth=0.5
).encode(
    color='sum_nuovi:Q'
).transform_aggregate(
    sum_nuovi='sum(nuovi)',
    groupby=["type","geometry"]
)

image

(I included some geom simplification and forcingly applied the left-hand rule. Maybe not always necessary, but good to know anyway)

jakevdp commented 4 years ago

You might be able to force the "geometry" into the groupby by passing it in the detail encoding.

This looks like a vega-lite bug to me: the aggegate-in-encoding should recognize that geometry cannot be dropped for geoshape.

informatica92 commented 4 years ago

Thank you guys, I tryied the approach suggested by mattijn and it works. I had already tryied something similar but I never grouped by "type" too (after all it is not a field inside the dataframe so I never could think about it). I also found the "geometry.simplify(0.1)" very useful because it makes the whole chart much lighter.

I agree with jakevdp by the way... it sounds like a bug to me too

mattijn commented 4 years ago

Yeah, I agree. You could never have known to include the field type in the groupby.