Cassiopeia crashes when Riot API returns incomplete data

This appears to be odd data coming back from the Riot API. But the exception raised by cassiopeia isn't really very helpful in terms of knowing what to do...

>>> import cassiopeia
>>> from cassiopeia.core import Match
>>> cassiopeia.set_default_region("EUW")
>>> m = Match(id=3282250163)
>>> m.participants[5].summoner.id
Making call: https://ddragon.leagueoflegends.com/realms/euw.json
Making call: https://euw1.api.riotgames.com/lol/match/v3/matches/3282250163
Making call: https://euw1.api.riotgames.com/lol/summoner/v3/summoners/by-account/0
Traceback (most recent call last):
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\merakicommons\ghost.py", line 41, in wrapper
    return method(*args, **kwargs)
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\cassiopeia\core\summoner.py", line 144, in id
    return self._data[SummonerData].id
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\cassiopeia\core\summoner.py", line 43, in id
    return self._dto["id"]
KeyError: 'id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\merakicommons\ghost.py", line 87, in __get__
    return self.fget(obj)
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\merakicommons\ghost.py", line 43, in wrapper
    raise GhostLoadingRequiredError(str(error))
merakicommons.ghost.GhostLoadingRequiredError: 'id'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\merakicommons\ghost.py", line 90, in __get__
    obj.__load__(load_group)
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\cassiopeia\core\common.py", line 192, in __load__
    data = configuration.settings.pipeline.get(type=self._load_types[load_group], query=query)
  File "C:\Users\UK03306\.virtualenvs\riot_data\lib\site-packages\datapipelines\pipelines.py", line 437, in get
    raise NotFoundError("No source returned a query result!")
datapipelines.common.NotFoundError: No source returned a query result!
>>>

The error is reproducible for me.

If I manually get that match ID via the Riot API, I see

        ...
        {
            "player": {
                "currentPlatformId": "EUW1",
                "summonerName": "Ezreal",
                "matchHistoryUri": "/v1/stats/player_history//0",
                "platformId": "",
                "currentAccountId": 0,
                "profileIcon": 0,
                "accountId": 0
            },
            "participantId": 6
        },
        ...

The Participant object doesn't expose any of the attributes in that JSON, so I don't see a way to avoid triggering an error trying to query https://euw1.api.riotgames.com/lol/summoner/v3/summoners/by-account/0.

I'd be happy with a simple Participant.is_valid() method I could use to check for a case like this. Or even just access to the underlying data/JSON from the API, so I could check myself. Didn't the previous version of Cass expose the underlying JSON? Is there a way of getting at that in the new API?

Hopefully by looking at the call that was made along with the error it didn't take too long to understand what the problem was, but I agree that knowing what to do about it isn't super clear. I can't think of a way to tell users how to fix this problem, however.

The issue is that the account ID is zero, and Riot doesn't return a summoner for that account ID (account ID 0 == bots). You can use if m.participants[5].summoner.account.id == 0 to check for validity. (I'm actually not sure why Riot even returns an account ID here.)

We have similar methods for some situations: summoner.current_match.exists(), summoner.rune_pages[5].exists(), and summoner.mastery_pages[5].exists(). The word .exist doesn't feel quite right, and I think I prefer Participant.is_bot() rather than Participant.is_valid() for the reasons I mentioned in the next paragraph.

I'm personally not super thrilled about checking for valid data ranges before making a request, because it introduces a ton of additional expectations for things that Cass will and will not handle, and predicting what valid data ranges are really isn't something that Cass should do. I'd rather have the data pipeline throw an error (like it did here) when that happens. As an aside, it's certainly possible for a user to have a summoner with account ID 0 in some other database, so checking for that isn't a good idea in Cass itself.

Maybe the correct thing to do here is to remove that data from the json returned by Riot. Then users simply wouldn't be able to access that summoner data in this case, and it would throw an attribute error. The is_bot would still be available, however.

Is there a way of getting at that in the new API?

It's possible, but we don't have a convenience method for it. You can use something analogous to participant._data[ParticipantData]._dto to access the data right now. This will probably change in the future, at which point we will also add a convenience method.

Thanks for the explanation. It actually wasn't at all easy for me to work out what was going on - and the real annoyance was that I couldn't use Cass to investigate as everything I tried crashed the same way. I had to go to the Riot pages and run the APIs manually - for that, the "Making call" line did give me the match ID (saved me needing to add prints of my own) and the fact that it was somehow trying to access summoner 0. In practice I find that the data pipeline, plus the use of ghost items, is a really nice feature but when something goes wrong, it's incredibly difficult to understand what happened. I nearly always have to go back to the raw API to make sense of the situation.

I thought it wasn't possible to have bots in ARAM games (my code filters to just pick up ARAM) so although I wondered if that was the problem, I discounted it.

[Actually, I just checked again - my match's queue ID is 33 - Queue.bot_intermediate_fives so for some reason

    match_history = MatchHistory(summoner=summoner,
                                 begin_time=patch.start, end_time=patch.end)
    match_history.filter(lambda match: match.queue is Queue.aram)

isn't working for me :-(

Should that be match_history = match_history.filter(...)? If so, the example crawler is wrong, at https://github.com/meraki-analytics/cassiopeia/blob/master/examples/match_collection.py#L10 ]

The account ID check is fine for me, although is_bot() sounds like a nice convenience API. In actual fact, as my application is just scraping for a big batch of (valid) matches, I'll likely just put a try: ... except: around the block and blacklist any matches that raise errors. (BTW, it might be nice if Cass had a base exception for things like this, although I guess if you're getting arbitrary exceptions from deep in the low level code, it's hard to catch them all - I can certainly work without it in my app).

Thanks again for all your help. In spite (or maybe because) of the fact that I keep hitting weird data errors, I'm having great fun working with Cass :smile:

Evidently we are a bit disillusioned at how easy things are to figure out! I guess a year of bugfixing will do that... Although we definitely know that Cass is very complex. That's not good though that the error messages aren't useful at all. What's happening here, is that the /by-account/0 url is throwing a 404: Not Found, and the data pipeline swallows that errors and continues to look for the data in another data source. If the data is never found, that 404 error never gets regurgitated, and is therefore invisible. Would it help have helped to not entirely swallow that error?

That line in the match collection example should absolutely be match_history = match_history.filter(...). Sorry! It's also worrying that the filter gave you a bot game... I will try to look into it if I can reproduce it.

The MatchHistory constructor now properly handles all the possible inputs (hopefully) correctly, so in the match collection example I modified the queue to be in the constructor rather than in a post-construction filter. If it's in the constructor, the queue ID is sent to Riot when the request is made, so only ARAM games will be returned by Riot. Hopefully this will short-circuit whatever the bug in .filter(...) was.

BTW, it might be nice if Cass had a base exception for things like this

We might be able to do something about this. Right now there are a few completely independent systems that Cass is built on top of (the data pipeline, ghost objects, etc) that can operate completely without Cass. These packages throw their own errors (similar to how requests throws urllib errors?), but we might be able to do something to help.

I've also added a quick Participant.is_bot() method if you ever want to use it.

Your issues have made it very clear how desperately we need tests... There are too many possibilities for things to go wrong and I'm having a lot of trouble keeping track of them all in my head. That makes this situation frustrating I'm sure, so thanks for being patient too. This is also why I am so responsive. Glad you're having fun at least! Personally, when I use Cass (and it works), everything feels like magic and it's awesome. BTW, we have a championgg api that plugs into Cass (with it's own set of wonderful bugs), and I've found it to be very useful for doing data analysis stuff. I use it regularly.

It's also worrying that the filter gave you a bot game.

I didn't explain well. The bot game was because I had copied the example and wasn't assigning the result of the filter back to the match_history variable. So it was just that I was getting non-ARAM matches that I hadn't thought I was.

The error message thing is very much a trade-off. The pipeline functionality is absolutely brilliant - but as you say, when it goes wrong, it tends to spiral off into a little world of its own. It would certainly be nice to get clearer errors, but I'm OK with considering the current situation as the price of getting essentially free caching capabilities, error retrying, etc. I just hope that I'm slowly getting sufficiently familiar with the library to help myself a bit more often, rather than pestering you :-)

One of the reasons I'm reluctant to just catch exceptions and move on is that doing so will simply mask any issues. And hitting issues has absolutely been the best way for me to learn more about the library - hopefully the things I've been finding are useful to you too.

Please don't feel you owe me fast responses - there's bound to be issues in a library as complex as this, and especially after a major rewrite like you just did. I'm starting to get to the point where I understand the code a bit better now, so hopefully I'll be able to start offering fixes instead of just problems.

meraki-analytics / cassiopeia

Cassiopeia crashes when Riot API returns incomplete data #151