Closed jdellape closed 2 years ago
begin with season range: 2019-2021
Research Update: Have not found an approach other than web scraping thus far.
The best scraping approach seems to be utilizing the scrapy library: https://scrapy.org/
YouTube video on how to setup scrapy project in vs code: https://www.youtube.com/watch?v=s4jtkzHhLzY
Gathered player id information from pro-football-reference.com using the scrapy program in this repo: https://github.com/jdellape/pro-football-scrapy
obtained the player names, unique base urls and positions for fantasy years 2019-2021. players.json file should repeat player information for each year that said player earned fantasy points within 2019-2021 time frame.
Next step: begin scraping stat information by player by year. This will likely be a time consuming task for program due to the number of pages being scraped. Start by trying to isolate desired data for one individual player.
Initial data output files (json) can be found here: https://github.com/jdellape/pro-football-scrapy/tree/main/data/output. This should contain basically all of the raw stat data for 2019 through 2021 regular season for following positions: qb, rb, wr, te. Next step will be to explore raw output and begin cleaning and re-shaping the data for the purposes of streamlit app.
Have code and approach in place to easily get data from websites now.
Scraping approach inspiration: https://stmorse.github.io/journal/pfr-scrape-python.html
Is there an API option available?