sprx97 / JCS

0 stars 0 forks source link

Add Vegas Lines #4

Closed jkeena88 closed 7 years ago

jkeena88 commented 8 years ago

I'll find the best source I can for you. If you want to share your aliasing file with me I can maybe update it for the new source if need be.

sprx97 commented 8 years ago

Yeah I just gave you write access on it. It's located at /var/www/html/JCSrankings/DatabaseGeneration/TeamAliases.csv.

My guess is the lines will mostly use 2-4 letter abbreviations like ND, MISS, or OSU. although they'll need to differentiate between Oregon State and Ohio State or we'll be in trouble... I figure most of these places will use unique identifiers in some way or another.

sprx97 commented 7 years ago

So I found an interesting way I might be able to nail line, attendance, and location all at once (for post-2000 games at least).

http://www.espn.com/college-football/game?gameId=400869118

Check out the "Game Information" pane in the bottom left.... seems to have everything we need - just a matter of reconciling this with the data from sports-ref. I'm still a bit afraid of pulling from ESPN because they update/modernize their site so often (it feels like every year). But for just peripheral data it might not be terrible.

sprx97 commented 7 years ago

Actually it looks like that data is only fully available from 2012 onwards... better than nothing, but a more complete source would be nice.

sprx97 commented 7 years ago

Here's another site, but it also looks like it starts in 2012.

http://www.donbest.com/ncaaf/odds/20120922.html

This also looks promising but I haven't explored the site yet: http://www.oddsshark.com/ncaaf/database

jkeena88 commented 7 years ago

Yeah I've been having some trouble finding a good one. Donbest was the most promising I had found so far. That ESPN data would be awesome, but I understand the difficulty with working with them. Especially with a page like that that looks like it might change design relatively frequently.

In any case, this isn't as important as other stuff to get older data (though that's always nice). I don't use lines in my model, I just like to compare my model to Vegas.

That OddsShark website looks freaking awesome... but I'm not sure how easy it will be to get the info we need with the interface they provide. For instance, you have to specify a team to search for, you can't pick a specific year (just x most recent games), and you're limited to 30 results. If there's a more flexible way to work with their database it would be perfect but I haven't found that yet.

sprx97 commented 7 years ago

Unfortunately I don't think there's a way to work with OddsShark unless we can find a schedule list... I think ESPN would be easier to get than Donbest, and I guess we have the sunshine forecast info for earlier dates, right?

jkeena88 commented 7 years ago

This site goes back further than the others (at least to 2011, I didn't check exactly how much further): http://www.sportsbookreview.com/betting-odds/college-football/?date=20131019

The only thing is that there is no aggregate line as far as I can find, it just lists a bunch of different lines. Which is great extra info and they're all usually really similar but I don't know the easiest or best way for you to handle that.

But if ESPN is easier anyway stick with that.

sprx97 commented 7 years ago

This one actually wouldn't be bad to scrape... Should we just pick a book and use it as our official line? Most of them are within a point or two of eachother anyway. Is there any book that is "most popular" or "most official"? Or do we want to manually aggregate (a bit more work, but should be 100% doable).

jkeena88 commented 7 years ago

I haven't been able to find a good answer to most popular or official. I know that ESPN uses an aggregate, for what that's worth, so if that's possible for you to do that may be best. Especially since that could hopefully be done in a way that will always work (past and future) even if a particular book stops doing lines. I don't know if this would be the case, but if there's any reason why it's significantly easier than the aggregate doesn't need to include ALL lines listed on the site, just the first few.

On another note, can you set this up to pull the lines after the games are over? Then they're locked in. Looks like you don't add games until after they've been played anyway, I just wanted to make a note of this.

jkeena88 commented 7 years ago

In case it matters, if you want to use sportsbookreview just for past games and ESPN going forward I'd be fine with that. Up to you.

sprx97 commented 7 years ago

Shouldn't be too hard to do the aggregate. Is that literally just average the N lines and round it to half-points? Never seen a 3.25 line or whatever. I'll probably just stick with the sportsbookreview, although not sure how permanent the site itself is.... if it disappears a few years down the road that'd suck.

And yes - it'll be all done after the games are completed.

jkeena88 commented 7 years ago

Yeah I think you're right, round to a half point after averaging all the lines

sprx97 commented 7 years ago

BOOM got'em. All the lines for 2016 are filled in so far... still need to backfill older years and set it up to get them each night though (test run for the nightly tonight).

A few were missing, but they're mostly games involving G5 vs FCS teams or lower tier than that. We can go back manually if any important ones are missing. Also appears that Hawaii rarely is given lines... IDK why that is - maybe timeszones?