outside-edge / python-espncricinfo

Python wrapper for the ESPNCricInfo JSON API
MIT License
146 stars 64 forks source link

added data scraping methods for player #48

Closed wally1002 closed 1 year ago

wally1002 commented 3 years ago

Hi, I have added some methods in the Player class for scraping player's data like career averages, summary and match by match data. This is my first open-source contribution so I don't know if it's the right way to do. Would really appreciate your feedback.

dwillis commented 3 years ago

@wally1002 Thanks very much for this! It would be ideal if the methods you added returned Python objects rather than CSV files. I think dictionaries would work well for this purpose. Would that be doable?

wally1002 commented 3 years ago

@dwillis

def get_data(self, file_name=None, match_format=11, data_type='allround', view='match'):

        """Get Player data match by match sorted by date

        Arguments:
            file_name {string}: File name to save data
            match_format {int}: Match format (default is 11) (1-Test), (2-Odi) (3-T20I), (11-All International), (20-Youth Tests), (21-Youth ODI)
            data_type {string}: Data type (default is allround) (allround, batting, bowling, fielding)
            view {string}: View type (default is match) (match, innings, cumulative, reverse_cumulative, series, tour, ground)

        Return:
            Data in csv file
        """
        self.match_format = match_format
        self.data_type = data_type
        self.view = view
        self.file_name = file_name

        if self.file_name is None:
            self.file_name = f"{self.player_id}_{self.match_format}_{self.data_type}_{self.view}.csv"

        self.url=f"https://stats.espncricinfo.com/ci/engine/player/{self.player_id}.html?class={self.match_format};template=results;type={self.data_type};view={self.view}"
        html_doc = requests.get(self.url)
        soup = BeautifulSoup(html_doc.text, 'html.parser')
        tables = soup.find_all("table")[3]
        table_rows = tables.find_all("tr")
        scores =[]
        match_data = []
        for tr in table_rows:
            scores.append(tr.text)
        for row in scores:
                match_data.append(row.splitlines()[1:]) 
        match_data[0][-1] = 'match_id'
        self.data = dict([(f'{self.view}_{self.match_format}_{self.data_type}', match_data)])

Every time the method gets called, it will save the data to a dictionary data with key {self.view}_{self.match_format}_{self.data_type}. But the keys become slightly complex. To access the data, the user should format the key in a specific format.

from espncricinfo.player import Player
kohli = Player('253802')
kohli.get_data(match_format=2)
# to access the data 
kohli.data['match_2_allround']

Will this way help? If yes, we can add a save argument and give the user a choice to save if needed?