Closed maxblee closed 4 years ago
A few comments/questions:
Is it better to use Data Golf's True Strokes Gained (same as regular strokes-gained adjusted for field strength), or conduct our own adjustments for field strength?
What are the most important covariates?
I found a useful article on scraping data here. Instead of scraping the data directly in our code, it recommends downloading the HTML files. I am working on downloading the 2019 HTML files from PGA for each SG category and the money list for each tournament and storing them in the raw data folder. I'll start on code to scrape them after that.
A few comments/questions:
- Is it better to use Data Golf's True Strokes Gained (same as regular strokes-gained adjusted for field strength), or conduct our own adjustments for field strength?
- What are the most important covariates?
- Would weather be a consideration?
- Do we need to collect data on course length or other course characteristics as a covariate?
- I found a useful article on scraping data here. Instead of scraping the data directly in our code, it recommends downloading the HTML files. I am working on downloading the 2019 HTML files from PGA for each SG category and the money list for each tournament and storing them in the raw data folder. I'll start on code to scrape them after that.
Option for bulk scraping by tournament: https://github.com/zachwill/golf/blob/master/pga.py Another resource on scraping PGA with beautiful soup: https://brianchesley.wordpress.com/2014/11/18/dissecting-the-tiger-woods-effect-with-beautiful-soup-and-pandas-pt-1/
I added the Python file to let us acquire the data. Could someone help me identify what stats (preferably by sending me links) we need to get + what years we need to get them for / can get them for?
I added the Python file to let us acquire the data. Could someone help me identify what stats (preferably by sending me links) we need to get + what years we need to get them for / can get them for?
Thanks Max! I think for our first pass it is the 2004-2019 official money list, SG:Putting and SG:off-the-tee
Would it be helpful for me to make a table for all the tournament names to IDs? It looked like that's an input we need for the script?
No, the scraper already collects that information. I can adapt it slightly so it stores that information, too, if we wind up needing it, e.g. to join data. But thanks!
Before we perform any analysis, we need to acquire data for our project. This may take the form of finding some place(s) that has/have good golf data (ideally at the level of individual tournaments). Or it may require scraping. Essentially, there are two parts to it:
Here's the data we need (from my understanding; please correct if I'm wrong):