meraki-analytics / cassiopeia

An all-inclusive Python framework for the Riot Games League of Legends API. Cass focuses on making the data easy and fun to work with, while providing all the tools necessary to create a website or do data analysis.
MIT License
552 stars 135 forks source link

Automate Patch File Update #143

Closed kihashi closed 6 years ago

kihashi commented 6 years ago

From http://cassiopeia.readthedocs.io/en/latest/contributing.html#contributions:

Currently, the patches file needs to be updated with the correct start date every time a new patch is released. There must be some way to automate this.

I have a couple ideas of how to go about this and I was hoping to get some feedback before I started working on a PR.

  1. Create a script that polls (either via a python mechanism or cron) the versions endpoint (/lol/static-data/v3/versions) every hour (or some other appropriate interval) and checks to see if there is a change. If there is, add the new patch, mark that time as the patch start time and as the previous patch's end time.
    • This approach uses static data end points, which is nice from the API limit perspective.
    • However, according to the docs, the version endpoint is for Datadragon and might not be 1:1 with game versions.
    • Additionally, while it is likely that datadragon is updated shortly after the client is, [it's not guaranteed](https://discussion.developer.riotgames.com/questions/30/how-long-does-it-take-static-data-data-dragon-to-u.html).
  2. The other possibility is doing a similar polling against the match endpoint, get the patch number, and compare it to the current list. I'm having a little trouble figuring out how to get a current match, though.
    • My first idea was to grab a feature match from the spectator API, but it looks like that has gameIds (which seem to be meaningless in the v3 API) instead of matchIds.
    • Second option would be to go through each player in challenger until we hit someone who is currently in game and use that. If we don't find someone in challenger, go to master, etc.
      • This uses way more API calls, unfortunately, especially as patches normally go live at night when fewer players are on.

Both of these approaches get you a fairly accurate start time, but not an exact one. It should be good enough for most purposes though. Does anyone have any opinions or suggestions about the methods I proposed above?

jjmaldonis commented 6 years ago

Awesome, thanks for thinking about this. So far, the patches are mainly used for the match/matchlist endpoints, so it would be ideal if it synced up well with matches. I've been leaning towards your (2), where we use the datetimes when matches were played to figure out the correct time. Like you said, that it quite a bit more difficult. Even your (1) option is better than what we have now.

Ideally there would be a single key dedicated to making these calls, so even if it takes quite a few calls it won't be a big deal. Even the rate limits for a dev key should be good enough.

I know that the static data endpoints pull their data directly from datadragon, so they will be updated at the same time as datadragon.

I thought gameIds and matchIds were the same. If you save a current game's gameId, wait an hour, then query the match endpoint with that saved gameId, don't you get the same match back? If so, that's a pretty good way of always keep track of very recently played games.

Another issue involved here is getting the updated patches to users. We can update it on github, but they won't have it locally. My guess is that we will need to configure Cass to treat patches like any other data, and pull it from some json file online.

Last, in-progress games are stopped if Riot rolls over a patch, so getting the patch datetime to within an hour should be considered "exact" imo.

jjmaldonis commented 6 years ago

Also, patches should be set per-region. Right now they are global, but that's not actually correct.

kihashi commented 6 years ago

I thought gameIds and matchIds were the same.

You may be right. I know back in the old API, there was a /games endpoint and a /matches endpoint which had different ids and I assumed that since the gameId I was getting not working that was still true-ish. But I did not wait until the game was over, so that's probably what it was.

If that's the case, then we can do something like this:

The only thing I haven't thought of here is how to deal with downtime (for example, if the program stops running and a patch is deployed while it is running).

Also, patches should be set per-region. Right now they are global, but that's not actually correct.

Yeah. I meant to mention that in the OP. This logic will need to run for each region. We'll probably have to change the structure of the patches.json file (or else have multiple) to account for multiple regions.

Something like:

...
  {
    "season": "Season 7",
    "name": "7.16",
    "start": {
        "NA": 1502251200.0,
        "EUW": 1502251200.0,
        ...
    },
    "end": {
        "NA": 1503460800.0,
        "EUW": 1503460800.0,
        ...
    },
  },
...

Another issue involved here is getting the updated patches to users.

Yeah. You can just have it be pulled from github or some other CDN. The updated version should always be available at this address: https://raw.githubusercontent.com/meraki-analytics/cassiopeia/master/cassiopeia/patches.json.

You can have it download it whenever the master list is grabbed and then have it fall back to the local file if for some reason the master list is not available.

jjmaldonis commented 6 years ago

This all sounds good. I like the new structure of the json data too.

The only thing I haven't thought of here is how to deal with downtime (for example, if the program stops running and a patch is deployed while it is running).

One possibility is to write a script that searches for a specific patch start datetime. Given a patch number, it will go through matches (probably like what you suggested by using masters + challengers players' match histories) and try to find when the patch started.

That's necessary for getting old patch data correct too.

If that's the case, then we can do something like this:

Sounds good. I imagine two functions and a main ~ like this:

def get_featured_game():
    """Returns a summoners rift featured (current) game if one exists; else None"""

def get_match_data(match_id):
    """Returns the match's version and start datetime."""

def main():
    """Every 5 minutes get a new featured game. Put it in a queue.
    After a featured game is requested, check the queue. If a game has been in the
    queue for an hour, get the match data.
    If the match's version/patch is new, update the patch info (make sure to flush to disk).
    """

Feel free to modify of course.

jjmaldonis commented 6 years ago

Update: We moved this file to https://github.com/CommunityDragon/Data/blob/master/patches.json

Below is the code that can be used to update it. This code could be run on a server using crontab, and the push to update the data could be done automatically (maybe using a PR with additional information about the error checking output). There should be some error checking to make sure the automatically identified timestamp is correct. Sometimes one region releases much later than the others, for example, and that will throw off the mean calculation.

import arrow
import datetime
from natsort import natsorted
import json
import numpy as np
from pathlib import Path
import os

import datapipelines
import cassiopeia as cass
from cassiopeia import Region, Queue

FILEPATH = Path("$HOME/.../CommunityDragon/Data/patches.json")
FILEPATH = Path(os.path.expandvars(FILEPATH))

"""
Current thoughts on auto-updating patch info:

Check every 6 hours for updates to a Realms endpoint.
If the version is updated (and we don't have patch info for it), run the below script.
Then put the output in the patches.json file and push it.
"""

def find_patch_start_date(start: arrow.Arrow, patch_name: str, region: Region, allowable_interval=datetime.timedelta(hours=6)):
    start = start - datetime.timedelta(days=1)
    end = arrow.now()
    challengers = cass.get_challenger_league(Queue.ranked_solo_fives, region=region)

    for entry in challengers.entries:
        summoner = entry.summoner
        mh = cass.get_match_history(summoner=summoner,
                                    region=region,
                                    begin_time=start,
                                    end_time=end,
                                    queues={Queue.ranked_solo_fives})
        for match in mh:
            try:
                match_patch_name = '.'.join(match.version.split('.')[:2])
            except datapipelines.NotFoundError:
                continue
            if match_patch_name == patch_name and match.creation < end:
                end = match.creation
                print(f"New patch end time: {end}")
            patch_major_minor = patch_name.split(".")[:2]
            patch_major_minor = (int(patch_major_minor[0]), int(patch_major_minor[1]))
            match_major_minor = match_patch_name.split(".")[:2]
            match_major_minor = (int(match_major_minor[0]), int(match_major_minor[1]))
            if match_major_minor < patch_major_minor and match.creation > start:
                start = match.creation
                print(f"New patch start time: {start}")

            if end - start < allowable_interval:
                return start, end

            if match.creation < start or match.creation > end:
                break
    print("WARNING! Did not converge.")
    return start, end

def get_unknown_patches(use_versions_endpoint=False):
    with open(FILEPATH) as f:
        patches = json.load(f)["patches"]
    missing = set()

    for region in Region:
        realms = cass.get_realms(region=region)
        latest_versions = natsorted(realms.latest_versions.values())
        latest_version = latest_versions[-1]
        latest_version = ".".join(latest_version.split(".")[:2])

        if use_versions_endpoint:
            versions_latest_version = cass.get_versions(region="NA")[0]
            versions_latest_version = ".".join(versions_latest_version.split(".")[:2])

            latest_version = natsorted([latest_version, versions_latest_version])[1]

        latest_patch = patches[-1]["name"]
        if latest_patch != latest_version:
            missing.add(latest_version)
    return sorted(missing)

def update_patch_data(region_results, patch_name):
    with open(FILEPATH) as f:
        patch_data = json.load(f)
    shifts = patch_data["shifts"]
    for region, ts in region_results.items():
        region_results[region] = ts.shift(seconds=-shifts[region])

    for region, ts in region_results.items():
        print(region, ts)
    mean = arrow_mean(region_results.values())

    # Assume the patch was released at 8 AM UTC
    previous_day = arrow.get(mean.shift(days=-1).date()) + datetime.timedelta(hours=8)
    today = arrow.get(mean.shift(days=0).date()) + datetime.timedelta(hours=8)
    next_day = arrow.get(mean.shift(days=1).date()) + datetime.timedelta(hours=8)
    days = [previous_day, today, next_day]
    diffs = [abs(mean - day) for day in days]
    correct_day = days[np.argmin(diffs)]

    season = cass.Season.season_8.id

    patch = {"name": patch_name, "start": correct_day.timestamp, "season": season}
    patch_data["patches"].append(patch)

    with open(FILEPATH, "w") as f:
        json.dump(patch_data, f, indent=2)

def arrow_mean(arrows):
    mean = np.mean([dt.timestamp for dt in arrows])
    return arrow.get(mean)

def main():
    missing = get_unknown_patches(use_versions_endpoint=True)
    if missing:
        print("Missing:")
        print(missing)
        print()
        results = {}
        for region in Region:
            for patch_name in missing:
                print("{}: Finding start time for patch: {}...".format(region, patch_name))
                start, end = find_patch_start_date(start=arrow.now() - datetime.timedelta(days=100),
                                      patch_name=patch_name,
                                      region=region
                                     )
                middle = start + (end-start)/2
                print(start, end, end-start)
                print(region, patch_name, middle.timestamp)
                results[region.platform.value] = middle

        update_patch_data(results, patch_name)
    cass.configuration.settings.clear_sinks(cass.Patch)
    return

if __name__ == "__main__":
    main()