jdellape / streamlit-nfl

Project for exploring nfl stats through streamlit application
1 stars 0 forks source link

obtain raw player data #5

Closed jdellape closed 2 years ago

jdellape commented 2 years ago

Scraping approach inspiration: https://stmorse.github.io/journal/pfr-scrape-python.html

Is there an API option available?

jdellape commented 2 years ago

begin with season range: 2019-2021

jdellape commented 2 years ago

Research Update: Have not found an approach other than web scraping thus far.

The best scraping approach seems to be utilizing the scrapy library: https://scrapy.org/

YouTube video on how to setup scrapy project in vs code: https://www.youtube.com/watch?v=s4jtkzHhLzY

jdellape commented 2 years ago

Gathered player id information from pro-football-reference.com using the scrapy program in this repo: https://github.com/jdellape/pro-football-scrapy

jdellape commented 2 years ago

obtained the player names, unique base urls and positions for fantasy years 2019-2021. players.json file should repeat player information for each year that said player earned fantasy points within 2019-2021 time frame.

Next step: begin scraping stat information by player by year. This will likely be a time consuming task for program due to the number of pages being scraped. Start by trying to isolate desired data for one individual player.

jdellape commented 2 years ago

Initial data output files (json) can be found here: https://github.com/jdellape/pro-football-scrapy/tree/main/data/output. This should contain basically all of the raw stat data for 2019 through 2021 regular season for following positions: qb, rb, wr, te. Next step will be to explore raw output and begin cleaning and re-shaping the data for the purposes of streamlit app.

jdellape commented 2 years ago

Have code and approach in place to easily get data from websites now.