datapipelines.queries.MissingKeyError: id must be in query!

I'm trying to modify the supplied example, match_collection.py, to get a collection of ARAM games for analysis, and I've been hitting the error datapipelines.queries.MissingKeyError: id must be in query!.

I've tried to simplify my example, and have reduced it to the following. I've taken the match_collection.py example from github, and modified it as follows:

Use match.queue is Queue.aram in the filter rather than Queue.ranked_solo.
Use my summoner name (GustavEnk) in place of the one given.
Add a line cassiopeia.apply_settings("settings.json") to the __main__ code.

My settings file is:

{
  "global": {
    "default_region": "EUW"
  },

  "pipeline": {
    "Cache": {},

    "DDragon": {},

    "RiotAPI": {
      "api_key": "RIOT_API_KEY",
      "request_error_handling": {
          "500": {
                "strategy": "exponential_backoff",
                "initial_backoff": 1.0,
                "backoff_factor": 2.0,
                "max_attempts": 4
          }
      }
    }
  },

  "logging": {
    "print_calls": true,
    "print_riot_api_key": false,
    "default": "WARNING",
    "core": "WARNING"
  }
}

When I run it, I get an error:

...
Making call: https://euw1.api.riotgames.com/lol/match/v3/matches/3208936940
INFO: Unexpected service rate limit, backing off for 76 seconds.           
Making call: https://euw1.api.riotgames.com/lol/match/v3/matches/3208936940
Making call: https://euw1.api.riotgames.com/lol/match/v3/matches/3206176603
Traceback (most recent call last):
  File ".\match_collection.py", line 56, in <module>
    collect_matches()
  File ".\match_collection.py", line 35, in collect_matches
    matches = filter_match_history(new_summoner, patch_715)
  File ".\match_collection.py", line 10, in filter_match_history
    match_history = MatchHistory(summoner=summoner)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\cassiopeia\core\common.py", line 41, in default_region_wrapper
    return method(self, *args, **kwargs)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\cassiopeia\core\common.py", line 175, in __call__
    return pipeline.get(cls, query=query)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\datapipelines\pipelines.py", line 433, in get
    return handler.get(query, context)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\datapipelines\pipelines.py", line 193, in get
    result = self._transform(data=result, context=context)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\datapipelines\pipelines.py", line 113, in _transform
    data = transformer.transform(target_type, data, context)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\datapipelines\transformers.py", line 105, in transform
    return transformer.transform(target_type, value, context)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\datapipelines\transformers.py", line 50, in wrapper
    return call(self, value, context=context)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\cassiopeia\transformers\match.py", line 56, in matchlist_data_to_core
    return MatchHistory([Match.from_match_reference(ref) for ref in value], region=value.region)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\cassiopeia\core\common.py", line 41, in default_region_wrapper
    return method(self, *args, **kwargs)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\cassiopeia\core\common.py", line 172, in __call__
    query = construct_query(cls, **kwargs)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\cassiopeia\datastores\uniquekeys.py", line 104, in construct_query
    return construct_match_history_query(**kwargs)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\cassiopeia\core\common.py", line 43, in default_region_wrapper
    return method(*args, **kwargs)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\cassiopeia\datastores\uniquekeys.py", line 228, in construct_match_history_query
    summoner = Summoner(id=summoner, region=region)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\cassiopeia\core\common.py", line 41, in default_region_wrapper
    return method(self, *args, **kwargs)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\cassiopeia\core\common.py", line 175, in __call__
    return pipeline.get(cls, query=query)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\datapipelines\pipelines.py", line 433, in get
    return handler.get(query, context)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\datapipelines\pipelines.py", line 185, in get
    result = self._source.get(self._source_type, deepcopy(query), context)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\datapipelines\sources.py", line 69, in wrapper
    return call(self, query, context=context)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\datapipelines\queries.py", line 322, in wrapped
    validator(query)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\datapipelines\queries.py", line 206, in __call__
    return self._root.evaluate(query, context)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\datapipelines\queries.py", line 59, in evaluate
    child.evaluate(query, context)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\datapipelines\queries.py", line 123, in evaluate
    raise errors[0]
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\datapipelines\queries.py", line 115, in evaluate
    is_true = child.evaluate(query, context)
  File "C:\Users\UK03306\.virtualenvs\e93714483b4e865\lib\site-packages\datapipelines\queries.py", line 193, in evaluate
    raise MissingKeyError("{key} must be in query!".format(key=self.key))
datapipelines.queries.MissingKeyError: id must be in query!

I've tried a couple of variations, both of which fail the same way:

    match_history = MatchHistory(summoner=summoner, end_index=20)
    match_history.filter(match.queue is Queue.aram)
    return match_history

    match_history = MatchHistory(summoner=summoner, end_index=20)
    return [m for m in match_history if m.queue is Queue.aram]

Can you help?

Thank you for the detailed issue! This ended up being a bit of a nasty bug, but I believe I've got a fix for it.

A few things: Please pull the latest version from github. I also updated the match collection filter function, which should save you a ton of unnecessary calls to the /match/ endpoint. Finally, please add UnloadedGhostStore() to your pipeline (I just added this in the docs late last week):

{
  "global": {
    "default_region": "EUW"
  },

  "pipeline": {
    "Cache": {},

    "UnloadedGhostStore": {},

    "DDragon": {},

    "RiotAPI": {
...

If this fixes your issue, feel free to close it. I'll let you do it since I'm not 100% sure this will solve everything.

No problem - I'll try to do some tests this evening. I'm not surprised there are bugs in there, the code's doing some very clever things, so stuff is bound to go wrong occasionally :-)

I'm curious - what was the bug and what was triggering it? I assume that the example match scanning code works for you, or at least that match scanning is a fairly common thing people want to do with the library, so if it was a common issue, I'd have found an existing bug (or based on how fast you fix things, more likely I'd have never hit the problem in the first place! ;-))

Well you got me thinking about why what I did fixed the issue, and actually I think it wasn't necessary... It was just that UnloadedGhostStore() needed to be added. That said, it did help me fix another minor bug with MatchHistory and realize that we had a transformer that wasn't working correctly (which was what I thought the original bug was; but it turns out we don't need that transformer at all).

Here are the details of the bug:

Without the UnloadedGhostStore, the bug would occur when a user requested a summoner's match history. The data pipeline would realize that it could get a MatchHistory object from the Riot API. It would successfully do that, but the RiotAPI data source provides a MatchListData object (which just contains the data for a summoner's match history, and doesn't have a bunch of the helper methods that make Cass great (which are contained in MatchHistory)). So the RiotAPI data source returned a MatchListData, and the data pipeline knows how to convert a MatchListData into a MatchHistory (which is what the user requested).

That transformer which converted a MatchListData to a MatchHistory had a bug in it: It was only passing up the data from the Riot API, plus the region, and wasn't passing up any summoner info (which the MatchHistory object requires). And since the summoner info was missing, it was triggering the ID-not-found error you were seeing when the summoner.id attribute was attempting to be accessed.

All of this is for naught, however, because the transformer from MatchListData to MatchHistory shouldn't exist. The RiotAPI data source should provide many MatchListDatas, one for each 100 games in the matchlist provided by Riot (since they break a summoner's match history into segments now), and each of those MatchListDatas should be used to supply the information to the MatchHistory object. (The MatchHistory object is essentially a lazily loaded list, where new values in the list are only created as they are accessed.)

The UnloadedGhostStore provides an empty MatchHistory when the user asks for one. Without it, and without the transformer (which shouldn't exist), you should have gotten the error datapipelines.pipelines.NoConversionError: No source can provide "MatchHistory" because no data source knows how to provide a MatchHistory object -- the Riot API doesn't because it doesn't know how to convert from a MatchListData to a MatchHistory, and there are no other data sources that provide this (or a similar) type. This error now gets thrown correctly.

Ah, OK. Thanks for taking the time to give me such a detailed explanation - I can't say I follow all of it, but the main idea (that MatchListData doesn't have all the information a MatchHistory has) is something I understand, and see how it was hitting me.

I do think that the pipeline is very powerful (I love the ability to add a persistence layer just by changing the config) but a bit confusing - I understand from your explanation why the UnloadedGhostStore is needed in there, but it's something I guess most people would just copy around blindly without understanding. It's a shame that if you want to modify the pipeline (e.g. to add disk persistence) you need to copy in all of the default stuff (or do you? I don't think I missed a way of doing that, but I should probably dig around the code to make sure I didn't miss anything.

Anyway, thanks for all your help, and for the library itself, which in spite of me hitting odd cases makes getting LoL data so much more fun than using the Riot API by hand :-) I'll confirm the problem's sorted as soon as I have a chance to run a test.

I'll add a bit more about our vision for how the data and interface in Cass works: There are two types of data, which we name DTO and CoreData. Every DTO object has a corresponding CoreData object in Cass.

A DTO is a piece of data that is provided by a data source you may or may not have control over (for example, the Riot API).

A CoreData is a piece of data that Cass uses. It can be identical to it's corresponding DTO, but can be different. For example, the Riot API provides data that has a few oddities that make it annoying to use, so we transform some of the data when we create a CoreData object from it's corresponding DTO.

The data sinks (e.g. databases, caches, etc) could store either DTO data or CoreData -- ideally it will be up to the user. We haven't fleshed this out yet, but it's on the todo list. So that is our "vision" at least.

And then there is an interface layer that simply uses the CoreData. MatchListData is a CoreData object, and MatchHistory is its interface. MatchHistory just provides all the nice functionality that you actually want to use, and MatchListData provides data in a format that's amenable to use.

but it's something I guess most people would just copy around blindly without understanding

Agreed. We are not at all thrilled with this. Your comment triggered some discussion between Rob and I, and we have a few ideas that should clear this up and make this clear to users. For example, we will make the UnloadedGhostStore invisible to users, and instead have an option where the user can specify whether or not they want unloaded objects (ie lazy loading on objects). If they don't want unloaded objects, all the data for an object will be pulled from a data source when the object is created.

It's a shame that if you want to modify the pipeline (e.g. to add disk persistence) you need to copy in all of the default stuff (or do you? I don't think I missed a way of doing that, but I should probably dig around the code to make sure I didn't miss anything.)

You are correct that you need to copy the entire (default) pipeline if you want to modify it. It'd be great if we could do that automatically without limiting functionality, but here is an example where it's useful to not make assumptions about where different data stores should be put in the pipeline:

Say I have a database that stores data for a medium amount of time, then that data expires and it's kicked out of the database. This database should almost certainly go between a cache and any online data sources (e.g. the Riot API). Let's also say I have a database that stores data forever -- this is to ensure that I always have data even if the Riot API is down. This database should go after all other data sources, and should only provide "back up data" in case all other data sources are down. So I've got two databases (and the code cannot distinguish between them automatically), but they are used for two entirely different purposes and should be at different spots in the pipeline.

We decided that by forcing our users to understand how the data pipeline works, we can allow them more freedom; and also that some of our ease-of-use features aren't so easy once you deviate from the defaults.

All that said, we really want to provide a bunch of different database options for our users. We'd like to provide Reddis, Mongo, and SQLAlchemy supported database out-of-the-box. We have a basic disk-based-database now. And since we can give them default behavior, we know exactly where they should be inserted into the data pipeline by default.

Also, if you are creating a database for your data and you think it's something that the community would use, we'd love to integrate it into Cass! Even just a baseline implementation could save us a ton of time, both decision-making time and implementation time. Please feel free to let us know if that sounds interesting!

Finally, I'll sum up with: Thanks for the super helpful comments, they caused us to realize some ways in which Cass can be significantly improved and the code can be clarified. These changes certainly won't cause any API-breaking changes, but it's possible that some of the underlying details might change -- but hopefully not much. Thanks to everyone reading this for their patience as we (and Riot) change things on occasion.

Hmm, with the released 3.0.14 plus the UnloadedGhostStore setting, I get

Making call: https://euw1.api.riotgames.com/lol/summoner/v3/summoners/by-name/GustavEnk
Traceback (most recent call last):
  File ".\match_collection.py", line 57, in <module>
    collect_matches()
  File ".\match_collection.py", line 37, in collect_matches
    unpulled_match_ids.update([match.id for match in matches])
  File ".\match_collection.py", line 37, in <listcomp>
    unpulled_match_ids.update([match.id for match in matches])
  File "C:\Users\UK03306\.virtualenvs\aa5ff9ff43a513c\lib\site-packages\merakicommons\container.py", line 348, in __iter__
    yield next(self)
  File "C:\Users\UK03306\.virtualenvs\aa5ff9ff43a513c\lib\site-packages\merakicommons\container.py", line 358, in __next__
    value = next(self._generator)
  File "C:\Users\UK03306\.virtualenvs\aa5ff9ff43a513c\lib\site-packages\cassiopeia\core\match.py", line 919, in generate_matchlists
    query = self.__get_query__()
  File "C:\Users\UK03306\.virtualenvs\aa5ff9ff43a513c\lib\site-packages\cassiopeia\core\match.py", line 949, in __get_query__
    if self.begin_time is not None:
  File "C:\Users\UK03306\.virtualenvs\aa5ff9ff43a513c\lib\site-packages\cassiopeia\core\match.py", line 989, in begin_time
    return datetime.datetime.fromtimestamp(self._data[MatchListData].begin_time / 1000)
TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'

With cass from github, I get:

Traceback (most recent call last):
  File ".\match_collection.py", line 4, in <module>
    from cassiopeia.core import Summoner, MatchHistory, Match
  File "C:\Users\UK03306\.virtualenvs\6e204dbce97a3db\lib\site-packages\cassiopeia\__init__.py", line 7, in <module>
    from .core import Champion, Champions, Rune, Runes, Mastery, Masteries, Item, Items, SummonerSpell, SummonerSpells, ProfileIcon, ProfileIcons, Versions, Maps, Summoner, Account, ChampionMastery, ChampionMasteries, Match, FeaturedMatches, ShardStatus, ChallengerLeague, MasterLeague, Map, Realms, LanguageStrings, Locales, LeagueEntries, Leagues
ImportError: cannot import name 'Leagues'

Maybe I should hold off trying again until things stabilise a bit...? ;-)

The error in the release was the "minor bug" I mentioned a few posts back. We will make a new release soon.

The error from github was because I had forgotten to push a file. It should be fixed now.

lol, you're insanely responsive. Many thanks for all your hard work and willingness to answer questions! I'll give it another go in a while.

Looks like it's sorted now. Although I was surprised that specifying end_index=20 in the MatchHistory constructor no longer works. It fails comparing int vs None - match.py at line 886:

assert end_index is None or end_index > begin_index

I don't really need to set an end_index, though (I can just slice the match history list, as it's lazy) so no problem.

Thanks for the help.

meraki-analytics / cassiopeia

datapipelines.queries.MissingKeyError: id must be in query! #150