tfausak / octane

:rocket: Parse Rocket League replays.
https://www.stackage.org/nightly/package/octane
Other
39 stars 1 forks source link

Map all the loadouts #8

Closed tfausak closed 8 years ago

tfausak commented 8 years ago

In 78560751656704831caaa634e19a7826462434e7, I started working on converting the IDs in loadouts (like 21) to descriptive strings (like "Backfire"). This is pretty tedious but fortunately goes pretty quick. I have a couple tricks to speed things up:

  1. Quickly create replays by setting up 1v1 exhibition matches with no bots. Set the maximum number of goals to 1. And disable goal reset, also know as the respawn time. When the game starts, just drive forward without boosting. The ball will go straight into the goal.
  2. Use a script to parse the replay and dump out all the IDs. Here's what I use:

    import glob
    import json
    import os
    import os.path
    import subprocess
    import sys
    
    subprocess.run(['stack', 'build', '--pedantic'], check=True)
    
    paths = list(glob.iglob(os.path.join(
       os.path.expanduser('~'),
       'Documents',
       'My Games',
       'Rocket League',
       'TAGame',
       'Demos',
       '*.replay')))
    paths.sort(key=lambda x: os.path.getmtime(x), reverse=True)
    path = paths[0]
    print(path)
    
    output = subprocess.check_output(['stack', 'exec', 'octane', path])
    replay = json.loads(output.decode('utf-8'))
    for replication in replay['frames'][0]['replications']:
       for name, value in replication['properties'].items():
           if name == 'TAGame.PRI_TA:ClientLoadout':
               contents = value['contents']
               print('Body:         {}'.format(contents[1]))
               print('Decal:        {}'.format(contents[2]))
               print('Wheels:       {}'.format(contents[3]))
               print('Rocket trail: {}'.format(contents[4]))
               print('Antenna:      {}'.format(contents[5]))
               print('Topper:       {}'.format(contents[6]))
               sys.exit(0)
  3. Go through and add all the pieces of the loadout like in d110572277e6a262677ee075d818f8b3725747dc.
  4. Repeat. Make sure that you don't pick a thing that's already been done.
tfausak commented 8 years ago

I finished everything except the antennas. There are a ton of them! I still need to do basically all the countries and video games. Otherwise this is pretty much done.

tfausak commented 8 years ago

I'm going to go ahead and close this. I'll get some antennas as I go, but for the time being I'm perfectly ok with Octane returning "Unknown antenna 123" for them.

danielsamuels commented 8 years ago

While I appreciate the amount of effort it takes to map these values, if I'm honest, I'm not at all a fan of this change. I'd much rather the parser returned the raw data, rather than trying to be clever and map the values itself. I currently maintain a lookup table on my end and convert the integers to strings, so it's an extremely trivial operation as a consumer. There's also a couple of benefits to this:

Is it at all possible to return the raw data, even if that is alongside your mapped output? This will increase the overall file size of the JSON output, but it's the only way I'd be able to use this and future versions of the parser.

tfausak commented 8 years ago

I could provide both. That's probably a good sanity check anyway.

I agree that converting the integers into strings is conceptually simple, but it's a lot of data for consumers to discover themselves. It takes a while to map all the IDs, so why make everyone repeat that effort?

Are you storing Octane's output directly somewhere? I'd imagine it would be transient.

danielsamuels commented 8 years ago

It takes a while to map all the IDs, so why make everyone repeat that effort?

I don't see this being any different to knowing Team 0 is Blue, 1 is Orange, knowing which rotation value is Yaw / Pitch / Roll. There's a whole load of stuff which requires specific knowledge of how things map, and even if you provide the mapped value, I'd definitely like to see the raw data too.

Are you storing Octane's output directly somewhere? I'd imagine it would be transient.

Only in memory during the parsing process.

Another mapping that I'm seeing being done is the platform, which means I'm now going to need to check against 3 values when I'm trying to determine which platform someone is using - some replays have an int, some have OnlinePlatform_Steam, and now some will be just 'Steam'. Again, having the raw data here will reduce the amount of complexity as a consumer. There is an argument to say that I should just reprocess every replay under the new output, but that's very (monetarily and computationally) expensive.

tfausak commented 8 years ago

Hmm, I see. For what it's worth, I left the team numbers alone because "blue" and "orange" don't always make sense. The colors in seasonal matches can be different.

You can see everything that I map in getPropertyValue. How do you feel about camera settings, for instance? I give those descriptive keys like "FOV" instead of returning a flat array of values. I guess you're generally talking about values instead of keys, in which case it should be simple to add new values that correspond to the raw value for whichever key. To wit:

{
 "Body": "Octane", // as before
 "BodyId": 23 // new
// or
{
 "Body": {
  "Id": 23,
  "Value": "Octane"
 }
rustyfausak commented 8 years ago

really there is no difference between "23" and "octane" besides file size. and since the file is being compressed and the properties are not replicated when they don't change, we're talking about a really marginal amount of data here. i think it makes total sense to send "octane".

danielsamuels commented 8 years ago

How do you feel about camera settings, for instance?

I can't remember if they were already coming through as a dictionary or not. But as with everything, I'd definitely prefer the raw data.

since the file is being compressed and the properties are not replicated when they don't change, we're talking about a really marginal amount of data here.

When you're looking a single replays, sure. But when you're looking at 50,000+ items it soon adds up.

i think it makes total sense to send "octane".

Like I mentioned before, some of the values are incorrect, so I have no way of knowing whether the information I'm getting is correct or not. Having the raw data and doing the association is much 'safer'.

Perhaps there could be an option in the application which determines whether it returns mapped data or not? That way people like me who understand the replay values and maintain lookup tables can get the raw data from the replays and carry on as normal, and people who would prefer the data to be pre-processed can get the mapped output.

jjbott commented 8 years ago

As far as "It takes a while to map all the IDs", it took me about 10 minutes to generate this dump of mostly everything from the RL files, and most of that was converting raw data to JSON.

danielsamuels commented 8 years ago

That's awesome, nice work! Do you mind if I make use of that data?

tfausak commented 8 years ago

Backfire, for example, is mapped incorrectly in your dictionary

I can't help but notice that Octane correctly mapped Backfire's ID (21, not 11) to its name. RLR shows replays with Backfire as "Unknown". For example, I just made this replay with Backfire.

In my mind, that's a good argument for the parser doing the mapping. It's easy to get it wrong.

And @jjbott, that list is awesome! I don't have to skill to generate a map like that, so I've been mapping things one at a time by creating short replay files and dumping the loadout information.

danielsamuels commented 8 years ago

In my mind, that's a good argument for the parser doing the mapping. It's easy to get it wrong.

The difference is that I can go to the entry in my database and update it, and that propagates across 56,800 replays instantly, without any additional work required.

jjbott commented 8 years ago

Go ahead and use that data, it's not mine! :) It's nearly in that exact format in the upk files if you've figured out how to poke around in them

danielsamuels commented 8 years ago

RLR shows replays with Backfire as "Unknown".

I've just noticed that version 0.7.0 seems to be doing the mapping from IDs already, that's why the output is broken on the website - it's expecting integer values and getting strings.

tfausak commented 8 years ago

As expected. Read the release notes.

danielsamuels commented 8 years ago

Yeah, that's my fault in not reading them.