Open vdavez opened 8 years ago
This appears to be a simple inner join effort. First, download the data set from Baseball Reference (e.g., http://www.baseball-reference.com/teams/WSN/2014-schedule-scores.shtml).
Then, need to do a little bit of data clean up, including deleting first row and adding year to the date field.
Finally, we load the data:
season2014 = read.csv('teams_WSN_2014-schedule-scores_team_schedule.csv')
season2014$Date = as.Date(as.character(season2014$Date),format="%A %b %d %Y") # Get date in ISO format
Turns out that there are weird dummy rows that need to be eliminated. Once that's cleaned, rewrite to CSV. Code updated:
season2014 = read.csv('teams_WSN_2014-schedule-scores_team_schedule.csv')
season2014$Date = as.Date(as.character(season2014$Date),format="%A %b %d %Y") # Get date in ISO format
season2014_cleaned = subset(season2014, season$Opp != "Opp") # Get rid of the dummy rows
write.csv(season2014_cleaned, file="season2014.csv")
Does attendance affect the winner?