Closed pfmoore closed 6 years ago
Thank you for the detailed issue! This ended up being a bit of a nasty bug, but I believe I've got a fix for it.
A few things: Please pull the latest version from github. I also updated the match collection filter function, which should save you a ton of unnecessary calls to the /match/ endpoint. Finally, please add UnloadedGhostStore()
to your pipeline (I just added this in the docs late last week):
{
"global": {
"default_region": "EUW"
},
"pipeline": {
"Cache": {},
"UnloadedGhostStore": {},
"DDragon": {},
"RiotAPI": {
...
If this fixes your issue, feel free to close it. I'll let you do it since I'm not 100% sure this will solve everything.
No problem - I'll try to do some tests this evening. I'm not surprised there are bugs in there, the code's doing some very clever things, so stuff is bound to go wrong occasionally :-)
I'm curious - what was the bug and what was triggering it? I assume that the example match scanning code works for you, or at least that match scanning is a fairly common thing people want to do with the library, so if it was a common issue, I'd have found an existing bug (or based on how fast you fix things, more likely I'd have never hit the problem in the first place! ;-))
Well you got me thinking about why what I did fixed the issue, and actually I think it wasn't necessary... It was just that UnloadedGhostStore()
needed to be added. That said, it did help me fix another minor bug with MatchHistory and realize that we had a transformer that wasn't working correctly (which was what I thought the original bug was; but it turns out we don't need that transformer at all).
Here are the details of the bug:
Without the UnloadedGhostStore
, the bug would occur when a user requested a summoner's match history. The data pipeline would realize that it could get a MatchHistory
object from the Riot API. It would successfully do that, but the RiotAPI data source provides a MatchListData
object (which just contains the data for a summoner's match history, and doesn't have a bunch of the helper methods that make Cass great (which are contained in MatchHistory
)). So the RiotAPI data source returned a MatchListData
, and the data pipeline knows how to convert a MatchListData
into a MatchHistory
(which is what the user requested).
That transformer
which converted a MatchListData
to a MatchHistory
had a bug in it: It was only passing up the data from the Riot API, plus the region, and wasn't passing up any summoner info (which the MatchHistory
object requires). And since the summoner info was missing, it was triggering the ID-not-found error you were seeing when the summoner.id
attribute was attempting to be accessed.
All of this is for naught, however, because the transformer from MatchListData
to MatchHistory
shouldn't exist. The RiotAPI data source should provide many MatchListData
s, one for each 100 games in the matchlist provided by Riot (since they break a summoner's match history into segments now), and each of those MatchListData
s should be used to supply the information to the MatchHistory
object. (The MatchHistory
object is essentially a lazily loaded list, where new values in the list are only created as they are accessed.)
The UnloadedGhostStore
provides an empty MatchHistory
when the user asks for one. Without it, and without the transformer (which shouldn't exist), you should have gotten the error datapipelines.pipelines.NoConversionError: No source can provide "MatchHistory"
because no data source knows how to provide a MatchHistory
object -- the Riot API doesn't because it doesn't know how to convert from a MatchListData
to a MatchHistory
, and there are no other data sources that provide this (or a similar) type. This error now gets thrown correctly.
Ah, OK. Thanks for taking the time to give me such a detailed explanation - I can't say I follow all of it, but the main idea (that MatchListData
doesn't have all the information a MatchHistory
has) is something I understand, and see how it was hitting me.
I do think that the pipeline is very powerful (I love the ability to add a persistence layer just by changing the config) but a bit confusing - I understand from your explanation why the UnloadedGhostStore
is needed in there, but it's something I guess most people would just copy around blindly without understanding. It's a shame that if you want to modify the pipeline (e.g. to add disk persistence) you need to copy in all of the default stuff (or do you? I don't think I missed a way of doing that, but I should probably dig around the code to make sure I didn't miss anything.
Anyway, thanks for all your help, and for the library itself, which in spite of me hitting odd cases makes getting LoL data so much more fun than using the Riot API by hand :-) I'll confirm the problem's sorted as soon as I have a chance to run a test.
I'll add a bit more about our vision for how the data and interface in Cass works: There are two types of data, which we name DTO
and CoreData
. Every DTO
object has a corresponding CoreData
object in Cass.
A DTO
is a piece of data that is provided by a data source you may or may not have control over (for example, the Riot API).
A CoreData
is a piece of data that Cass uses. It can be identical to it's corresponding DTO
, but can be different. For example, the Riot API provides data that has a few oddities that make it annoying to use, so we transform some of the data when we create a CoreData
object from it's corresponding DTO
.
The data sinks (e.g. databases, caches, etc) could store either DTO
data or CoreData
-- ideally it will be up to the user. We haven't fleshed this out yet, but it's on the todo list. So that is our "vision" at least.
And then there is an interface layer that simply uses the CoreData
. MatchListData
is a CoreData
object, and MatchHistory
is its interface. MatchHistory
just provides all the nice functionality that you actually want to use, and MatchListData
provides data in a format that's amenable to use.
but it's something I guess most people would just copy around blindly without understanding
Agreed. We are not at all thrilled with this. Your comment triggered some discussion between Rob and I, and we have a few ideas that should clear this up and make this clear to users. For example, we will make the UnloadedGhostStore
invisible to users, and instead have an option where the user can specify whether or not they want unloaded objects (ie lazy loading on objects). If they don't want unloaded objects, all the data for an object will be pulled from a data source when the object is created.
It's a shame that if you want to modify the pipeline (e.g. to add disk persistence) you need to copy in all of the default stuff (or do you? I don't think I missed a way of doing that, but I should probably dig around the code to make sure I didn't miss anything.)
You are correct that you need to copy the entire (default) pipeline if you want to modify it. It'd be great if we could do that automatically without limiting functionality, but here is an example where it's useful to not make assumptions about where different data stores should be put in the pipeline:
Say I have a database that stores data for a medium amount of time, then that data expires and it's kicked out of the database. This database should almost certainly go between a cache and any online data sources (e.g. the Riot API). Let's also say I have a database that stores data forever -- this is to ensure that I always have data even if the Riot API is down. This database should go after all other data sources, and should only provide "back up data" in case all other data sources are down. So I've got two databases (and the code cannot distinguish between them automatically), but they are used for two entirely different purposes and should be at different spots in the pipeline.
We decided that by forcing our users to understand how the data pipeline works, we can allow them more freedom; and also that some of our ease-of-use features aren't so easy once you deviate from the defaults.
All that said, we really want to provide a bunch of different database options for our users. We'd like to provide Reddis, Mongo, and SQLAlchemy supported database out-of-the-box. We have a basic disk-based-database now. And since we can give them default behavior, we know exactly where they should be inserted into the data pipeline by default.
Also, if you are creating a database for your data and you think it's something that the community would use, we'd love to integrate it into Cass! Even just a baseline implementation could save us a ton of time, both decision-making time and implementation time. Please feel free to let us know if that sounds interesting!
Finally, I'll sum up with: Thanks for the super helpful comments, they caused us to realize some ways in which Cass can be significantly improved and the code can be clarified. These changes certainly won't cause any API-breaking changes, but it's possible that some of the underlying details might change -- but hopefully not much. Thanks to everyone reading this for their patience as we (and Riot) change things on occasion.
Hmm, with the released 3.0.14 plus the UnloadedGhostStore
setting, I get
Making call: https://euw1.api.riotgames.com/lol/summoner/v3/summoners/by-name/GustavEnk
Traceback (most recent call last):
File ".\match_collection.py", line 57, in <module>
collect_matches()
File ".\match_collection.py", line 37, in collect_matches
unpulled_match_ids.update([match.id for match in matches])
File ".\match_collection.py", line 37, in <listcomp>
unpulled_match_ids.update([match.id for match in matches])
File "C:\Users\UK03306\.virtualenvs\aa5ff9ff43a513c\lib\site-packages\merakicommons\container.py", line 348, in __iter__
yield next(self)
File "C:\Users\UK03306\.virtualenvs\aa5ff9ff43a513c\lib\site-packages\merakicommons\container.py", line 358, in __next__
value = next(self._generator)
File "C:\Users\UK03306\.virtualenvs\aa5ff9ff43a513c\lib\site-packages\cassiopeia\core\match.py", line 919, in generate_matchlists
query = self.__get_query__()
File "C:\Users\UK03306\.virtualenvs\aa5ff9ff43a513c\lib\site-packages\cassiopeia\core\match.py", line 949, in __get_query__
if self.begin_time is not None:
File "C:\Users\UK03306\.virtualenvs\aa5ff9ff43a513c\lib\site-packages\cassiopeia\core\match.py", line 989, in begin_time
return datetime.datetime.fromtimestamp(self._data[MatchListData].begin_time / 1000)
TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'
With cass from github, I get:
Traceback (most recent call last):
File ".\match_collection.py", line 4, in <module>
from cassiopeia.core import Summoner, MatchHistory, Match
File "C:\Users\UK03306\.virtualenvs\6e204dbce97a3db\lib\site-packages\cassiopeia\__init__.py", line 7, in <module>
from .core import Champion, Champions, Rune, Runes, Mastery, Masteries, Item, Items, SummonerSpell, SummonerSpells, ProfileIcon, ProfileIcons, Versions, Maps, Summoner, Account, ChampionMastery, ChampionMasteries, Match, FeaturedMatches, ShardStatus, ChallengerLeague, MasterLeague, Map, Realms, LanguageStrings, Locales, LeagueEntries, Leagues
ImportError: cannot import name 'Leagues'
Maybe I should hold off trying again until things stabilise a bit...? ;-)
The error in the release was the "minor bug" I mentioned a few posts back. We will make a new release soon.
The error from github was because I had forgotten to push a file. It should be fixed now.
lol, you're insanely responsive. Many thanks for all your hard work and willingness to answer questions! I'll give it another go in a while.
Looks like it's sorted now. Although I was surprised that specifying end_index=20
in the MatchHistory constructor no longer works. It fails comparing int vs None - match.py at line 886:
assert end_index is None or end_index > begin_index
I don't really need to set an end_index, though (I can just slice the match history list, as it's lazy) so no problem.
Thanks for the help.
I'm trying to modify the supplied example,
match_collection.py
, to get a collection of ARAM games for analysis, and I've been hitting the errordatapipelines.queries.MissingKeyError: id must be in query!
.I've tried to simplify my example, and have reduced it to the following. I've taken the
match_collection.py
example from github, and modified it as follows:match.queue is Queue.aram
in the filter rather thanQueue.ranked_solo
.GustavEnk
) in place of the one given.cassiopeia.apply_settings("settings.json")
to the__main__
code.My settings file is:
When I run it, I get an error:
I've tried a couple of variations, both of which fail the same way:
Can you help?