vdavez / presidents

Data from http://blog.letteddywin.com
1 stars 2 forks source link

Attendance #1

Open vdavez opened 8 years ago

vdavez commented 8 years ago

Does attendance affect the winner?

vdavez commented 8 years ago

This appears to be a simple inner join effort. First, download the data set from Baseball Reference (e.g., http://www.baseball-reference.com/teams/WSN/2014-schedule-scores.shtml).

Then, need to do a little bit of data clean up, including deleting first row and adding year to the date field.

Finally, we load the data:

season2014 = read.csv('teams_WSN_2014-schedule-scores_team_schedule.csv')
season2014$Date = as.Date(as.character(season2014$Date),format="%A %b %d %Y") # Get date in ISO format
vdavez commented 8 years ago

Turns out that there are weird dummy rows that need to be eliminated. Once that's cleaned, rewrite to CSV. Code updated:

season2014 = read.csv('teams_WSN_2014-schedule-scores_team_schedule.csv')
season2014$Date = as.Date(as.character(season2014$Date),format="%A %b %d %Y") # Get date in ISO format
season2014_cleaned = subset(season2014, season$Opp != "Opp") # Get rid of the dummy rows
write.csv(season2014_cleaned, file="season2014.csv")