Open TK2575 opened 1 year ago
Sorry commented in line as opposed to on PR:
I think you're doing too much work here there's an api:
pitch_df = pd.DataFrame(json.loads(requests.get('https://www.fangraphs.com/api/projections?stats=pit&type=steamer').content
Thanks @blacktj, didn't know that API endpoint existed in front of a paywall, that's great! I'm assuming there's some rate limit expectation we'll need to respect like we do with baseball reference? I'll need to dig into this a bit.
It's non-public and buried in the client-side rendering of the table. I am working on a PR for the prospects endpoint of this as well.. not sure if it's rate limited though. It's wide open. The risk I see is if they do lock it down.
I think I'll need to defer to this repo's maintainers as to which approach to take. There's precedence for scraping Fangraphs page source for other methods, though I don't know if that's because either a) we weren't aware of the API at the time or b) the API didn't/doesn't support those data. Querying from the API would certainly be cleaner, but I'd be hesitant in moving forward using a non-public API without some form of developer contract and/or buy-in from this repo's maintainers.
This is a webscraping repo.. so I'm guessing we don't have a contract to pull the data from their actual website? Is there a difference between grabbing it there or from the API they use to render the table?
Introduces a function and related tests and documentation that captures player projection data from Fangraphs. Provides argument options to specify the projection source, position, league and team. Extends the teamid lookup method to provide a fg team ID lookup needed for applying team level filtering using a stored dictionary fixture.