jldbc / pybaseball

Pull current and historical baseball statistics using Python (Statcast, Baseball Reference, FanGraphs)
MIT License
1.28k stars 334 forks source link

Quality Starts Stat Missing? #50

Open dbalders opened 5 years ago

dbalders commented 5 years ago

Hey, when I do pitching_stats I get tons of data. The only stat I can see that is missing is Quality Starts. Do you know how I can get that stat? I see it on the baseball reference page for pitchers, but am not sure how to get it via the tool.

Thank you for your time.

ttaylor14 commented 5 years ago

I am also looking for Quality Starts

LaSupp commented 4 years ago

I am in the same boat. I couldn't find Quality Starts for a pitcher. When browsing the data on Fan Graphs and Baseball Reference I didn't see this metric. I assume another souce would have to be scraped in order to get quality starts.

blacktj commented 4 years ago

MLB has a scrape-able option:

import pandas as pd
from bs4 import BeautifulSoup
import requests

def get_quality_starts():

    qs_stop = 1
    page = 0
    df = pd.DataFrame()

    while qs_stop != 0:
        url = 'https://www.mlb.com/stats/pitching/quality-starts?expanded=true&page={}'.format(page)
        qs = pd.read_html(url)[0]['caret-upcaret-downQS']
        soup = BeautifulSoup(requests.get(url).content)
        list_names = [i['aria-label'] for i in BeautifulSoup(str(soup.find('table'))).find_all('a', 'bui-link')]
        temp_df = pd.DataFrame({'Name': list_names, 'QS': qs})
        df = df.append(temp_df)
        qs_stop = min(temp_df['QS'])
        print(page, qs_stop)
        page += 1

    df.reset_index(inplace=True, drop=True)
    return df

Page through the pre-sorted MLB stat page by QS until we hit a page with 0 QS, return a clean DF with name / qs count. This option will scale through a whole season as the current MLB page has 27 pages, but could get slow with that many pages.

Risk is we'd have to join this with Fangraphs data and needs to match names.. After a quick view, looks like Fangraphs and MLB do not use Int'l chars in their dashboards (checked Pablo Lopez, Jose Berrios).

Print statement just for checks..

TheCleric commented 4 years ago

The playerid_reverse_lookup function can convert between mlb and fg ids.

MLB id could be extracted from the players' url in the name column's href.