wsheehan19 / Equibase

1 stars 5 forks source link

ChangeGenerateItems #5

Open AlexanderLawson17 opened 3 years ago

AlexanderLawson17 commented 3 years ago

I don't have access to the full track ids, so sub that out below, but this should work better to generate all the ids. I'll come up with some improvements later as well.

from datetime import timedelta, date

def daterange(start_date, end_date):
    for n in range(int((end_date - start_date).days)):
        yield start_date + timedelta(n)    

def generate_items(tracks_list:dict = None, 
                   query_string:str = None,
                   start_date:datetime.date = datetime.date(1998,1,1), 
                   end_date:datetime.date = datetime.datetime.now().date()) -> list:
    """
    tracks_list: the tracks you want to iterate over in dictionary form with the key being the country and the value being the array of tracks in that country
    query_string: the query string you want to format, just in case it changes or we id other ones to review
    start_date: the start date to iterate from
    end_date: the end date to iterate to
    """
    # initatilze an items list
    items = []
    # if track_list is none, supply a base one
    if tracks_list is None:
        tracks_list = {'USA': ['KEE', 'CMR', 'GP'], 
                       'CAD': []}
    # if query string is none supply the base one
    if query_string is None:
        query_string = """https://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE={race}&BorP=P&TID={track}&CTRY=USA&DT={date}&DAY=D&STYLE=EQB"""
    # get all the dates - we can prob clean this up for dates that there were no races. 
    # dates = pd.date_range(start_date, end_date)
    # go through all the track countries and pull out the tracks
    for track_countries in tracks_list.keys():
        items += [query_string.format(race=race, 
                                      track=track, 
                                      date=date.strftime('%m/%d/%Y')) 
                  for date in daterange(start_date, end_date) 
                  for race in range(0,15) 
                  for track in tracks_list[track_countries]
               ]
    return items
AlexanderLawson17 commented 3 years ago

Code takes about 2 seconds to run with default arguments, I assume as we expand the #s of tracks it will increase, but we can run it on the fly now to not worry about keep up with updating things. If we make a class to pull the tracks it should be fully robust to changes.

I do think we should filter out the dates for only F-Sun, but I want to valdiate with Randy that horses only race F-Sun