rdpharr / project_notes

Notes about my data projects
https://rdpharr.github.io/project_notes/
Apache License 2.0
3 stars 3 forks source link

MLB Power Rankings and Casino Odds | rdpharr’s projects #4

Open utterances-bot opened 3 years ago

utterances-bot commented 3 years ago

MLB Power Rankings and Casino Odds | rdpharr’s projects

Part 3 - adding power rankings and odds into the MLB prediction model

https://rdpharr.github.io/project_notes/baseball/webscraping/elo/trueskill/glick/2020/09/22/power-rankings-and-casino-odds.html

kevbarroga commented 3 years ago

In part 4 when re-running part 2 to get more data from 2013, how did you deal with this error ParserError: hour must be in 0..23: 2015-04-25 24:59:59 when prepping and cleaning odds data?

rdpharr commented 3 years ago

That's some kind of strangeness in the data from covers.com. I think it's there for everybody who comes back to this script a 2nd time, so thanks for commenting.

The way I've dealt with it is by rolling the time back an hour with a replace method. Something to the effect of odds.date = odds.date.str.replace('24:59","23:59").

It's might be more proper to not use those data points. You can do that by changing odds.date = pd.to_datetime(odds.date).dt.date to odds.date = pd.to_datetime(odds.date, errors='coerce').dt.date. The resulting "NaT" will prevent the data from being merged into the main dataframe.

Hope that helps!

kevbarroga commented 3 years ago

Thanks for the quick reply! I'll try that out.

reentercaptcha commented 1 year ago

Is there any way to get the code ready to run?