joemgon / worldcup2018

Awesome Stats
1 stars 1 forks source link

Updates: Rankings and Historical Results #9

Open michaelpawlus opened 6 years ago

michaelpawlus commented 6 years ago

I added two scripts and three CSVs.

The one is a fairly straightforward scrape of the current FIFA World Rankings table.

The other was a bit more of a challenge but I put together a table of results from the last five Cups.

Check it out if you get a minute and see if you spot any errors or anything.

Also, Gonzales I need your help with a subjective question. What constitutes an upset?

Right now, I have it defined as any time a higher seeded (lower ranked) team either wins or draws against a lower seeded (higher ranked) team.

Are draws upsets? Does the gap between ranks matter? Does the score matter? Does this really matter at all? I'm not totally sure but as I mentioned I have two ideas for attacking this problem.

  1. The traditional method, we train on the results from 98,02,06,10 and test on 14 to build a model and then apply this to 18 data. I'll add that NCAA script soon to facilitate this.

  2. We start with a prior, say that there will be 14 upsets and then we use all that FIFA data to build some power scores and then use those power score differentials to pick the 14 upsets. In this case, we need a good working definition of an upset.

Alright, enough rambling. Feels good to get these done.

joemgon commented 6 years ago

Hey Mike, thanks for posing this question, let me ruminate on it. Culturally I think an upset would be one where a lesser ranked team beats a higher seeded team. In this case, a gap between ranks matters, however, I do not think of draw as an upset namely because a draw tells me that every team at this level has a chance: a chance to win by a single goal. A shot on goal is a chance, a chance to win. To me, that is the spirit of the game. So how would we weight a draw then? Different from an upset? Maybe.

I really like your second approach to this, I think namely because it is untraditional but let's not get ahead of ourselves. Here I am thinking about all the under rated teams that have made it here and the circumstances that have lead to their participation in the competition. Is there something bigger there??Say teams like Iceland have made it through based on a culturally relevant phenomenon in their country or an infusion of resources by the government or otherwise to make it so. Or did teams get in on a fluke (think US participation in the 1990 Cup) and then create upsets by virtue of their participation.

I'll get back to this tonight and review the new files. I have some car work to do and will give this the attention it needs this eve.

michaelpawlus commented 6 years ago

Those are all good points and yes I like the idea of trying to work in some this extra data that is outside the bounds of pure soccer stats.

joemgon commented 6 years ago

Hey Mike, yeah, if we had more time I would like to see us build out something that takes into account some of the issues we have brought up here - is it possible to go with route 1? In looking at upsets, they fell by a third from previous modern years so we could use 14 upsets with power scores and it will be interesting to see how many upsets occur this year. Upsets by rankings makes sense for this scenario (considering time constraints) and let's consider ties an upset.