joenano / rpscrape

Scrape horse racing results data and racecards.
140 stars 57 forks source link

order by race time #139

Closed allotmentandy closed 9 months ago

allotmentandy commented 9 months ago

Hi, this is fantastic, but is there a way to order the racecards by time?

rmwesley99 commented 9 months ago

Hi Andy, It might help if you describe in more detail what you are trying to do with the race card file(s). Each daily race card is downloaded in JSON file format, so you could use a text / code editor to open and edit it manually. Alternatively you could process the file programmatically (e.g. use python to import it into a pandas DataFrame) in which case you can then manipulate it however you want. Regards, Richard

allotmentandy commented 9 months ago

sure, i have written a blog on it here

https://allotmentandy.github.io/blog/2023-12-04-Scraping-Racing-Post-Horse-Prediction-System-Tips/

i am basically developing the php parser to read the json file and make a placepot predictor / scoring each horse in each race.

the current order seems to be in prize money order, but i would prefer to get the data in time order. of course i can tinker with the json, but just thought it might be easier to set the order in the python script.

rmwesley99 commented 9 months ago

In which case just parse the JSON file and stick it into a pandas DataFrame. Then you can do whatever you want with it.

header = ['date','off_time','course_id','course','race_name','race_id',
          'dist_f','class','type','field_size','going','surface', 
          'horse_id','horse','region','trainer','draw','lbs','or','jockey', 'jockey_id']
def read_data(f):
    if os.stat(f).st_size == 0:
        data = {}
    else:
        with open(f) as json_file:
            data = json.load(json_file)

    return data
def process_racecard(data):
    runner_list = []
    country_list = (country for country in data if country in ['GB'])

    for country in country_list:
        for fixture in data[country]:
            events = [event for event in data[country][fixture]]
            for event in events:
                races = [race for race in data[country][fixture][event]]
                for detail in races:
                    if detail == 'runners':
                        runners = [runner for runner in data[country][fixture][event][detail]]
                        for runner in runners:
                            this_runner = []
                            this_runner.append(data[country][fixture][event]['date'])
                            this_runner.append(data[country][fixture][event]['off_time'])
                            this_runner.append(data[country][fixture][event]['course_id'])
                            this_runner.append(data[country][fixture][event]['course'])
                            this_runner.append(data[country][fixture][event]['race_name'])
                            this_runner.append(data[country][fixture][event]['race_id'])
                            this_runner.append(data[country][fixture][event]['distance_f'])
                            this_runner.append(data[country][fixture][event]['race_class'])
                            this_runner.append(data[country][fixture][event]['type'])
                            this_runner.append(data[country][fixture][event]['field_size'])
                            this_runner.append(data[country][fixture][event]['going'])
                            this_runner.append(data[country][fixture][event]['surface'])
                            for k, v in runner.items():
                                if k in ['horse_id','name','region','trainer',
                                         'draw','lbs','ofr','jockey','jockey_id']:
                                    this_runner.append(v)
                            runner_list.append(this_runner)

    df = pd.DataFrame(runner_list, columns = header)
    df = df.sort_values(['course','off_time','lbs'], ascending=[True, True, False])

    return df
allotmentandy commented 9 months ago

thank you, i will investigate adding this to the codebase.

andy

joenano commented 9 months ago

Keys are not ordered in JSON objects, so the race times will be random order. It could have been an array which is ordered, but changing now would mess up all existing code people are using.

This will convert the races to an array ordered by race time.

#!/usr/bin/env python

from orjson import loads, dumps

path_racecard = '../racecards/2023-12-04.json'
path_ordered = '../racecards/ordered_2023-12-04.json'

with open(path_racecard, 'r') as f:
    racecards = loads(f.read())

    for region, courses in racecards.items():
        for course, races in courses.items():
            ordered_races = [races[time] for time in sorted(races.keys())]

            racecards[region][course] = ordered_races

    with open(path_ordered, 'w') as out:
        out.write(dumps(racecards).decode('utf-8'))           
allotmentandy commented 9 months ago

thanks for this code.

is there any repos of what people use it for? i am interested in how people take the variables like rpr, or and ts and weight the horses. does anything like that exist?

rmwesley99 commented 9 months ago

That is the Holy Grail.

Generally you would download historical results data (e.g. using rpscrape.py) to build a local database. You would then use the historical data to build and back-test a [hopefully] profitable system, e.g. based on form reading, purely statistical, your own ratings system, value betting, odds arbitrage etc.

You would then forward-test and live-run the system using the race cards to find horses which met the criteria of your system.

It isn't a lot different to technical analysis systems applied to financial markets.

If you want more details on how to build a system your best options are either horse racing blogs (e.g. Dave Renham on geegeez) or betting forums (e.g. UK Betting Forum). However, if someone has published a system on the internet you can 100% guarantee is not profitable, or at least no longer profitable.