vega / vega-datasets

Common repository for example datasets used by Vega-related projects
262 stars 209 forks source link

Add one dataset for the sports fans #64

Closed RandomFractals closed 4 years ago

RandomFractals commented 5 years ago

I suggest this top goal-scorers and footballers since 1980:

https://johnburnmurdoch.github.io/projects/goal-lines/all-comps/

domoritz commented 5 years ago

We can add this if you send a PR and also add examples to Vega or Vega-Lite.

RandomFractals commented 5 years ago

I knew this was coming @domoritz Only so much time in my 24/7 too. Still twisting my mind how to add nimble arrow data support to VegaViewer vscode ext. using your libs, examples and these data sets.

Can someone else pick up this slack? I do think it would be cool to have a few datasets on sports in this repo, and I am not a sports fan.

Let's keep this open for now and see if we have any passionate devs willing to pitch in on this?

domoritz commented 5 years ago

Yep, same problem here with only a few hours in the day. Let's leave this open and mark it as "help wanted".

RandomFractals commented 5 years ago

@domoritz sounds good! I'll do one better for you guys:

I woke up this morning thinking we could use a simple vega datasets preview js notebook :)

https://observablehq.com/@randomfractals/vega-datasets

vega-datasets-notebook

I'll leave it up to you guys if you want to add this vega datasets preview utility Observable notebook to your editor or this datasets repo readme.md.

This notebook can be used as a supplemental tool for online vega editor and examples that use these data sources since it's much faster in data loading and scrolling than what github and vega editor provides.

I might add something similar to the https://github.com/RandomFractals/vscode-vega-viewer as a split panel in vega chart preview in the next major release.

cc @kanitw @arvind & @jheer

Cheers! 🤗

eitanlees commented 4 years ago

I messed around with the data linked in the example and came up with this

import altair as alt

URL = "https://raw.githubusercontent.com/johnburnmurdoch/johnburnmurdoch.github.io/master/projects/goal-lines/all-comps/smallData.csv"

picks = [
    'Lionel Messi', 
    'Cristiano Ronaldo', 
    'Alan Shearer', 
    'Jürgen Klinsmann'
]

base = alt.Chart(URL).mark_line(
    color='lightgrey', size=1
).encode(
    alt.X('date:T', axis=alt.Axis(title='Date')), 
    alt.Y('G:Q', axis=alt.Axis(title='Total Goals')), 
    alt.Detail('name:N'), 
    alt.Tooltip(['name:N', 'maxG:Q']),
).properties(width=608, height=342)

highlight = base.transform_filter(
    alt.FieldOneOfPredicate(field='name', oneOf=picks)
).encode(
        color=alt.Color('name:N', 
                        legend=alt.Legend(title='Player Name'), 
                        scale=alt.Scale(scheme='set1')),
        size=alt.value(4)
    )

others = base.transform_filter({'not':alt.FieldOneOfPredicate(field='name', oneOf=picks)})

alt.layer(others + highlight).configure_legend(
    titleFontSize=15,
    labelFontSize=12, 
    symbolStrokeWidth=4
).configure_axis(grid=False)

visualization (38)

RandomFractals commented 4 years ago

nice! I forgot we had this convo with @domoritz

what's alt & how do I stick it in JS notebook? just kidding!

eitanlees commented 4 years ago

So I will gladly add this dataset to the repo but first I figured we should ping @johnburnmurdoch to ask permission.

It comes from: https://github.com/johnburnmurdoch/johnburnmurdoch.github.io/blob/master/projects/goal-lines/all-comps/smallData.csv

Here is a look

id,name,date,age,mins,played,G,NPG,maxG,maxNPG,G90,NPG90
1,Lionel Messi,2004-10-16,17.314,7,1,0,0,600,532,0.98,0.87
1,Lionel Messi,2004-12-11,17.467,196,6,0,0,600,532,0.98,0.87
1,Lionel Messi,2005-10-01,18.272,354,12,1,1,600,532,0.98,0.87
1,Lionel Messi,2005-11-19,18.407,677,18,2,2,600,532,0.98,0.87
...

If I do include the dataset do we want all the columns or just name, date, age, mins, played, G, NPG?

Also is csv or json prefered?

RandomFractals commented 4 years ago

I'd say add that small data csv from source & forgo the json route since it's too verbose :)

just link to medium data csv in that repo comps for others to explore at will? ...

RandomFractals commented 4 years ago

@eitanlees also, can you add my vega-datasets JS notebook to docs in this repo while you at it?

what prompted this issue logging in the first place back in May :)

& I have not seen better 1 on Oberservable HQ to explore the raw vega datasets yet ...

https://observablehq.com/@randomfractals/vega-datasets

domoritz commented 4 years ago

Let's add the notebook in a separate pull request.

RandomFractals commented 4 years ago

that's fine. thanks!

eitanlees commented 4 years ago

closed by https://github.com/vega/vega-datasets/pull/161