vega / altair

Declarative statistical visualization library for Python
https://altair-viz.github.io/
BSD 3-Clause "New" or "Revised" License
9.24k stars 787 forks source link

Adding an example to the "Maps" section that shows trajectories of hurricanes #3092

Open thomascamminady opened 1 year ago

thomascamminady commented 1 year ago

I wonder whether it would help to add another example to the "Maps" section in the examples that illustrates who geo data, especially trajectories, can be visualized. I think the only other example that illustrates this is the London Tube Lines example and I actually missed this at first because it was in the "Case Studies" section.

Here's a version of what I hope is a useful (interactive) example that might be added as an example.

hurricanes

And here's the source code.

legend_selection = alt.selection_point(fields=["Year"], bind="legend")

storms = (
    alt.Chart(df)
    .mark_trail()
    .encode(
        longitude=alt.Longitude("Longitude:Q"),
        latitude=alt.Latitude("Latitude:Q"),
        color=alt.Color("Year:N"),
        detail=alt.Detail("ID:N"),
        size=alt.Size("Maximum Wind:Q").scale(range=(0, 5)),
        opacity=alt.condition(legend_selection, alt.value(1.0), alt.value(0.1)),
    )
    .add_params(legend_selection) # of course we could also skip interactivity
)

map_background = alt.Chart(
    alt.topo_feature(data.world_110m.url, feature="countries")
).mark_geoshape(stroke="white", strokeWidth=2, color="lightgray")

chart = (
    alt.layer(map_background, storms)
    .project(translate=[1000, 600], scale=500)
    .properties(width=700, height=700)
)

chart

Now there's one caveat: The data that I use is taken from Kaggle, although the original data comes from the National Hurricane Center and is published under the CC0 license. I downloaded the data from Kaggle and performed some post-processing, the steps are shown below.

# data from https://www.kaggle.com/datasets/noaa/hurricane-database

df = (
    pl.read_csv("./atlantic.csv")
    .with_columns(
        pl.col("Date")
        .cast(str)
        .apply(lambda s: s[:4] + "-" + s[4:6] + "-" + s[6:])
        .str.strptime(pl.Date, fmt="%Y-%m-%d"),
        pl.col("Latitude").apply(
            lambda s: float(str(s[:-1])) if s[-1] == "N" else -float(str(s[:-1]))
        ),
        pl.col("Longitude").apply(
            lambda s: -float(str(s[:-1])) if s[-1] == "W" else float(str(s[:-1]))
        ),
    )
    .with_columns(
        pl.struct(["Date", "Time"])
        .apply(lambda s: f"""{s["Date"]}-{s["Time"]//100:02d}""")
        .str.strptime(pl.Datetime, fmt="%Y-%m-%d-%H")
        .alias("Datetime")
    )
    .with_columns(pl.col("Maximum Wind").cast(float))
    .with_columns(pl.col("Date").dt.year().alias("Year"))
    .with_columns(
        (pl.col("Datetime") - pl.col("Datetime").first())
        .over("ID")
        .dt.hours()
        .alias("Age")
    )
    .sort("ID", "Datetime")
    .filter(pl.col("Date").dt.year() > 2010)
    .to_pandas()
)

If there's some interest in this example, maybe adding this data to the vega_datasets repository might be an option? I am happy to pursue this further if the route via vega_datasets would make sense.

Regardless of whether or not this example will be added, I do think an example that shows how to plot routes (initially, I wanted to plot my running routes) would help.

jonmmease commented 1 year ago

Hi @thomascamminady, this is a really nice example! And I agree that it fills a gap in our current gallery.

I'll defer to @mattijn and @joelostblom on this, but I think our policy has been to only pull datasets from the vega-datasets package, so that may indeed be the next step. So I would recommend opening a PR in https://github.com/vega/vega-datasets/ with your example above as motivation and:

Once that's merged and released we would need to release the Python wrapper at https://github.com/altair-viz/vega_datasets.

cc @domoritz and @arvind as maintainers of vega-datasets.

thomascamminady commented 1 year ago

Thanks for the feedback! I'll have a look at vega-datasets and update this thread once the data made it into it.

joelostblom commented 1 year ago

Neat example! I think it would be great to include this (and it could maybe be a great example for temporal animation too once that is added to Vega-Lite). I agree that it could be great to have this added to the vega_datasets package (although I am not sure if the Python version is currently up to date, so there might be some work required on that too). A possible alternative would be to serve it somewhere on a public URL, although this is less preferred, we do have at least one example that loads from an external URL and it is in the geo viz section.

domoritz commented 1 year ago

I’d be happy to add this dataset to Vega datasets but then it would be great to add an example to the Vega-Lite gallery as well.

thomascamminady commented 1 year ago

Okay then I'll proceed with the following steps:

I hope to get to this by the weekend :)

mattijn commented 1 year ago

One caveat, see https://github.com/altair-viz/altair/pull/2310#issuecomment-705241724.

To update the vega_datasets package that is used in Altair, there should come a new release of this package to be published within pypi (and thereafter conda).

I would love to assist in this, but I don't have publish rights on pypi for these altair-subpackages. Currently @jakevdp is the sole maintainer of https://pypi.org/project/vega-datasets/.