meraki-analytics / cassiopeia

An all-inclusive Python framework for the Riot Games League of Legends API. Cass focuses on making the data easy and fun to work with, while providing all the tools necessary to create a website or do data analysis.
MIT License
552 stars 135 forks source link

MatchHistory needs both begin and end time #153

Closed pfmoore closed 6 years ago

pfmoore commented 6 years ago

When specifying a time range, the MatchHistory API needs both begin and end time. But the definition of patch 7.20 has None as the end time, resulting in:

Traceback (most recent call last):
  File "C:/Work/Projects/riot_data/match_collection.py", line 83, in <module>
    scanner.process_summoner()
  File "C:/Work/Projects/riot_data/match_collection.py", line 50, in process_summoner
    queues={Queue.aram})
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\cassiopeia\core\common.py", line 41, in default_region_wrapper
    return method(self, *args, **kwargs)
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\cassiopeia\core\common.py", line 175, in __call__
    return pipeline.get(cls, query=query)
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\datapipelines\pipelines.py", line 433, in get
    return handler.get(query, context)
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\datapipelines\pipelines.py", line 185, in get
    result = self._source.get(self._source_type, deepcopy(query), context)
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\datapipelines\sources.py", line 69, in wrapper
    return call(self, query, context=context)
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\datapipelines\queries.py", line 323, in wrapped
    return method(self, query, context)
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\cassiopeia\datastores\ghost.py", line 395, in get_match_history
    return UnloadedGhostStore.create_ghost(MatchHistory, query)
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\cassiopeia\datastores\ghost.py", line 28, in create_ghost
    return type.__call__(cls, **kwargs)
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\cassiopeia\core\match.py", line 892, in __init__
    raise ValueError("Both `begin_time` and `end_time` must be specified, or neither.")
ValueError: Both `begin_time` and `end_time` must be specified, or neither.

Process finished with exit code 1

A simplified script that should demonstrate the issue is:

from cassiopeia.core import Summoner, MatchHistory
from cassiopeia.data import Patch
from cassiopeia import set_default_region

set_default_region("EUW")
p = Patch.from_str("7.20")
m = MatchHistory(Summoner(name="GustavEnk"),
                 begin_time=p.start, end_time=p.end
    )

However, at the moment it's failing for me with the "datapipelines.queries.MissingKeyError: id must be in query!" error (which seems to be popping up again for me :-()

pfmoore commented 6 years ago

Suggestion: if end_time is None and begin_time is given, set end_time to (now + 1 second) or something.

jjmaldonis commented 6 years ago

Hmm, so the reason I went with throwing that error was because of this line in Riot's docs:

If beginTime is specified, but not endTime, then these parameters are ignored. If endTime is specified, but not beginTime, then beginTime defaults to the start of the account's match history. https://developer.riotgames.com/api-methods/#match-v3/GET_getMatchlist

I think that first bullet point doesn't make sense -- instead, like you said, if end_time is None and begin_time is given, then setting end_time to "when the call is made" seems more appropriate.

We occasionally do things like this -- where we modify stuff that Riot specifies -- because we don't think it makes sense. We're often a bit leery of doing that, however, because usually Riot does a good job.

Knowing what I just linked, does that change your opinion at all? If not, I'm happy to modify the code to what you suggested.

pfmoore commented 6 years ago

My main concern is that the start and end times for the latest Patch make it unusable in the MatchHistory API (see my sample script, which works for patch 7.19). Maybe it's the Patch object that should avoid returning None for the end date of the latest patch?

But honestly, I'm happy to leave it - setting start to datetime.now() - timedelta(days=7) and end to datetime.now() just got me an error 400 "Invalid Request" from the Riot API. So I'm inclined to just write off the use of time ranges as too flaky to be bothered with, and just grab the full match history and post-process to select the items I want.

jjmaldonis commented 6 years ago

There is a good chance we will change it so that end_time gets set to datetime.now() + 1 second. I think I'd like to do that, and I'll run it past Rob as well. Thanks for bringing it up.

I ran into the same issue in the match collection example, and I fixed it like this: https://github.com/meraki-analytics/cassiopeia/blob/master/examples/match_collection.py#L10 So essentially I left it up to the user to figure out unless someone brought it up :)

Also, the fix to the match history MissingKeyError is to use summoner= as the first argument. I've fixed the match history constructor locally so that kwargs are required.

If you used the Riot API developer portal to make that request, it might be throwing a 400 because you have to enter the timestamp in milliseconds........

robrua commented 6 years ago

We used to have this directly supported within the MatchAPI DataSource itself using the query validators: https://github.com/meraki-analytics/cassiopeia/blob/7842527564cbf367d3c85c2aa79cc2610e0674fe/cassiopeia/datastores/riotapi/match.py#L73-L74 I guess we lost it along the way somewhere for a bug fix or after the latest Riot API changes that impacted this endpoint.

I definitely agree we should add the defaults, at the very least at the core level, but maybe also add it back into the DataSource. Though it's largely arbitrary, I'd prefer going with datetime.now() rather than datetime.now() + 1 for the end time. There's no effective difference due to the time the request takes, and I think the semantics match a little better (i.e. "give me the match list as of now").

I think I like having None rather than datetime.now() as a sentinel value for "ongoing" for the Patch.end_time, so I'd prefer the defaulting behavior to be in the MatchHistory or in the MatchAPI. Using datetime.now() would restrict us from consistently pre-setting patch end dates if we know them in advance, and could cause confusion if the output of patch.end_time is stored and used for something later.

We might even want to add a kwarg for MatchHistory which just accepts the Patch directly, rather than needing to set the start/end to its fields.

pfmoore commented 6 years ago

If you used the Riot API developer portal to make that request, it might be throwing a 400 because you have to enter the timestamp in milliseconds........

I used Cass to do it, but it's possible I got something wrong. I was rushing to check it before I had to pick up on another piece of work.

jjmaldonis commented 6 years ago

Rob and I talked for a while about this and we decided that due to some complicated logic, we are going to postpone automatically setting end_time if begin_time is set and end_time is None. (Although if we can figure out nice solutions to the issues below, we'll happily implement this.)

Here are our notes:

The issues described in the first bullet point (and to some extent those in the 2nd) are largely nullified by forcing the user to specify an end time. We are essentially offloading the work on the user, but the benefit is explicit understanding of how the MatchHistory object will work in the above situations.

For my own notes: We need to move the MatchHistory generator from the type system into the Riot API data source, and return the generator as the "data" to construct the MatchHistory from. This correctly defines the line between the type system and the Riot API datasource for match histories, and it will allow us to cache / save the match history object in data sinks if we ever get to that.

pfmoore commented 6 years ago

Hmm, yeah good points. I hadn't really thought about how this would work in the context of long-running processes (I presume you're thinking of something like a web service). With that in mind, it does make sense to ask the user to be explicit.

jjmaldonis commented 6 years ago

This should be working now in commit d003af6c3b193c77114338507322f6614733058a. More specifically, the issue first bullet point in my comment above is now handled correctly.

I rewrote the MatchHistory data-pulling one last time, and tested as many use cases as I could think of and they all worked expected. Hopefully there aren't any bugs.

For future reference, here were the match history tests I used:

    summoner = Summoner(name="Kalturi", region="NA")

    match_history = cass.get_match_history(summoner=summoner, region=region, seasons={Season.season_7}, queues={Queue.ranked_solo_fives})
    match_history = cass.get_match_history(summoner=summoner, region=region, seasons={Season.season_7}, queues={Queue.ranked_solo_fives}, begin_time=datetime.datetime.utcnow()-datetime.timedelta(days=140), end_time=datetime.datetime.utcnow())
    match_history = cass.get_match_history(summoner=summoner, region=region, seasons={Season.season_7}, queues={Queue.ranked_solo_fives}, begin_time=datetime.datetime(2017, 2, 7), end_time=datetime.datetime(2017, 2, 14))
    match_history = cass.get_match_history(summoner=summoner, region=region, seasons={Season.season_7}, queues={Queue.ranked_solo_fives}, begin_time=datetime.datetime(2016, 1, 1), end_time=datetime.datetime(2016, 1, 11))
    match_history = cass.get_match_history(summoner=summoner, region=region, seasons={Season.season_7}, queues={Queue.ranked_solo_fives}, begin_time=datetime.datetime(2016, 1, 1), end_time=datetime.datetime.utcnow())
    match_history = cass.get_match_history(summoner=summoner, region=region, seasons={Season.season_7}, queues={Queue.ranked_solo_fives}, begin_time=datetime.datetime(2016, 12, 1), end_time=datetime.datetime(2016, 12, 30))
    match_history = cass.get_match_history(summoner=summoner, region=region, seasons={Season.season_7}, queues={Queue.ranked_solo_fives}, begin_time=datetime.datetime(2016, 12, 1))