Detecting common strategies

goto-bus-stop commented 7 years ago

While we can't figure out exactly what happens in a game (see #1), it should be possible to auto-detect some common strategies based on when buildings or units are created/queued and which techs are researched.

e.g., some potential heuristics, assuming a Dark Age start:

Drush: 3ish militia built before minute 12
Fast Castle: Clicking Castle without doing Loom, within 2 minutes after Feudal is complete
Douche: New Town Center built in Dark Age
Scrush: Multiple stables + lots of scouts early in Feudal Age
more?

Could be:

foreach ($rec->players() as $player) {
    // $player->guessStrategy() === 'drush';
}

happyleavesaoc commented 6 years ago

I think "openings", as a subset of all strategies, can be reliably guessed.

Trush: Tower(s) in Feudal with closer proximity to enemy TC than player's TC
MAA: MAA upgrade + militia trained
Castle drop: see Trush, but with castle
Forwarding: Ranges further than x% of map from player's TC
Sling: Tribute > x in Feudal

Alternately, with a list of training & techs (<= feudal):

Train a classifier on a reference set; or
Apply clustering algorithm, review and label

I'm taking a shot at the basic heuristics in my analyzer. Will see if it's practical or not.

happyleavesaoc commented 6 years ago

Turns out detecting a drush is not easy. Everyone builds a barracks. And queued/trained units don't have a player_id. So we know someone drushed, but not who. However, the more alternative strategies we can codify, the more players we can rule out. The other strategies are not as hard, since there are more distinguishing factors.

goto-bus-stop commented 6 years ago

I think move commands do have the player ID. Maybe a drush could be detected by checking if militia are built at all, and then if a player sends 3 or 4 units far away from their TC? It'd not be quite as reliable of course :(

happyleavesaoc commented 6 years ago

Yep, that would help.

I settled on a system that has multiple evidence checks per strategy, each with a resulting weight. Stronger evidence lends a larger weight, etc. The weights accumulate into a final probability score per strategy.

The problem is, I'm not an expert player, so I don't know the right constants, like, what timeframes are appropriate, etc.

bowswung commented 5 years ago

I might be missing something, but from what I understand of the action formats can't you match the buildingid from the build action with the buildingid in the train action and know which player made certain units?

goto-bus-stop commented 5 years ago

The build action doesn't contain the ID of the building object unfortunately, only of its type.

bowswung commented 5 years ago

Ah that was what I was missing, thanks! So what I am referring to is what is described here https://github.com/stefan-kolb/aoc-mgx-format as the Object Id. What I don't understand, then, is how the game engine knows which object units are being trained from. There must be something in the rec format that logs that.

goto-bus-stop commented 5 years ago

The engine runs the entire game and assigns IDs at runtime. the engine is deterministic enough were the IDs will always be the same when playing the game back. But it's not possible to do that without running an accurate simulation.

bowswung commented 5 years ago

Or wait, the building_id here is not the same as the one here. So the Object ID gets assigned by the game engine while it is running I guess and then can be referred to later?

If the Object IDs are incremented predictably then isn't it possible to simulate that and figure out which building belongs to which player?

bowswung commented 5 years ago

Haha yeah sorry, didn't see your comment before I posted.

goto-bus-stop commented 5 years ago

If the Object IDs are incremented predictably then isn't it possible to simulate that and figure out which building belongs to which player?

The IDs are shared between buildings and units. There is a problem when eg 10 units are queued in a production building, but then the building is destroyed before some of the units are created.

goto-bus-stop commented 5 years ago

Also the order of IDs would be hard to predict with units being queued, and eg. another building being started before all of the units were created.

bowswung commented 5 years ago

So the ids are never assigned because the units don't get created? Is that the same with building object ids if they are destroyed before they are finished building? In other words, when do the ids get assigned?

goto-bus-stop commented 5 years ago

Iirc it's assigned for buildings when a foundation is placed, but units don't exist yet when they're still in the queue, so for them it's assigned when they're complete.

On 4 November 2018 12:18:34 CET, bowswung notifications@github.com wrote:

So the ids are never assigned because the units don't get created? Is that the same with building object ids if they are destroyed before they are finished building? In other words, when do the ids get assigned?

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/goto-bus-stop/recanalyst/issues/17#issuecomment-435661072

-- Sent from mobile. Please excuse my brevity.

bowswung commented 5 years ago

Yeah ok that's what I would expect and at least makes things somewhat easier. So in principle if you can simulate training times for units it should be possible to get the object ids for specific buildings tied to a player, up until the point where production buildings start being destroyed?

goto-bus-stop commented 5 years ago

If you can figure out when buildings are destroyed, for which you need to simulate pathfinding, researches, unit production, battles, etc, because it also isn't stored 😅

On 4 November 2018 12:24:49 CET, bowswung notifications@github.com wrote:

Yeah ok that's what I would expect and at least makes things somewhat easier. So in principle if you can simulate training times for units it should be possible to get the object ids for specific buildings tied to a player, up until the point where production buildings start being destroyed?

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/goto-bus-stop/recanalyst/issues/17#issuecomment-435661476

-- Sent from mobile. Please excuse my brevity.

goto-bus-stop commented 5 years ago

Basically, only explicit player actions are stored, and everything else is computed at runtime.

bowswung commented 5 years ago

Yeah I realise that, but up to early feudal that isn't really an issue - it is incredibly rare for someone to have their stable destroyed while building their second scout.

bowswung commented 5 years ago

What I mean is, you could use this to figure out, in almost all cases, what the first military buildings were, how many units were trained, etc.

goto-bus-stop commented 5 years ago

Hmm, that makes sense, it could be an okay heuristic for the early game maybe. A remaining issue I can think of, is that queueing actions may not take resource limitations into account. I don't recall how it works, but it could be that shift-creating villagers sends a queue action with a count of 5 even when you only have resources for 2. The same applies when the queue is already full but that is not relevant in early game.

bowswung commented 5 years ago

Hmm the resource thing might be a killer if that is the case. When you said that the object id for a building was assigned when the foundation is placed, do you mean when the command is made? Or when the villager first hits?

goto-bus-stop commented 5 years ago

The foundation is placed as soon as the command is sent.

bowswung commented 5 years ago

Perfect, so in that case it should be possible just to track the order of those, and you can work out whose building is whose based just on order throughout the game.

bowswung commented 5 years ago

I've taken a quick look at one of the test games in the repo and this seems to work out in principle. The object ids for the buildings seem to be assigned in the order they were commanded. If that is the case, then it should be fairly straightforward to assign building object ids to a player in a predictable way in the vast majority of cases, and therefore tie train commands to a player as well.

The only time where this wouldn't be possible is if both players build the same type of building, only one of them uses it for units (and not techs) in the entire game, and they only start using it after the other player has commanded for it to be built. In the later game I guess this might happen a fair amount, if both players are building multiple barracks at similar times, for example, though the higher a level the game is the less likely that is to happen. Destruction would also affect things as well, but I suspect in most cases it will still be possible to infer whose buildings are whose from other things (the types of units being trained could distinguish as well, if, say, the building builds an elite skirm when one player has researched it and the other player hasn't). Late Imp it might fall apart.

Are object ids recycled over a game?

IamFlea commented 5 years ago

Hey folks, AFAIK, it is quite impossible to assign Object ID (oid) to a player from mgx file. The assign of player to oid might be f*cked by an enemy monk. And you cant even predict it in the early game! The player would get housed for just a second. Because the construction could be interrupted by a wolf, or could be slowed by bugged pathfinding. Thus, the OID of his new villager will not correspond to your prediction. And if the player would build barracks with the new villager, then you will get bad result. :)

Anyway, I've been playing with the idea of logging strategies during coding Bartender It reads data directly from the memory's game. You can see stats like idle time, currently carrying resources, player owner, or HP of each object. On the other hand, you must run the game on the background and it might crash. Also it is written in Python, it might be slow if there is a lot of objects to analyze or if you have ten years old computer.

bowswung commented 5 years ago

But that doesn't matter I think. Because you know which order buildings are built, you know that whichever player commanded a barracks to be built first, that barracks will have a lower oid. Let's say player 1 and player 2 both build barracks, and player 1 builds first. You then get something like this later:

Train 2235 83 (or whatever militia is)

You don't know whether building oid 2235 is player 1 or player 2. But if you look later in the recording, you realise that when player 2 did Men at Arms upgrade, they did it at building 2235. So that must be player 2s barracks. Or you see another line later like Train 2230 71 (or whatever spear is). As soon as that happens, you know that 2230 is player 1's barracks, and 2235 is player 2's, because player 1 commanded the barracks first, so it gets a lower oid. It doesn't matter whether the building was actually completed first, or if the villager even hit it once. I'm pretty sure about this now, because the delete command for deleting a foundation is the same as the one for a completed building, and it uses the oid of the building, so it must be assigned as soon as the foundation is placed, like goto-bus-stop said.

The same logic applies throughout the game - whichever id is earlier will be the one that was placed earlier, and we know which player placed first from the build commands.

For actually finding out oids of units, it is probably pretty easy to determine military from villagers, as a start. Any oid unit that builds something is a villager. Any oid unit that is tasked to a resource is a villager. And the primary action command includes the player id, so just from those you can assign unit oids to players. For military, any unit or group of units that have patrol, formations, etc applied to them must be military, and any time a unit is moved or does a primary action (attack) we can get the player id for it from that. Some units might get missed (for example, vills that are sent straight to resources with rally points) but I'd guess you could identify like 80 or 90% of a players units and classify them as military or villager.

But which military? Well, that will be more of a guessing game. But we know what units are ordered. So if 10 scouts are ordered, and we find 10 military units being commanded to move around by player 1, we are sure they are scouts and that they belong to player 1. In the later game it will be trickier, but we could probably know fairly closely what units have actually been built, and where they are, just not necessarily which is which (like there have been 20 crossbows and 10 pikes built by a player, and they have all been sent to the enemy's base, we just don't know which of the 30 are crossbows and which are pikes). The main issue here is that the train command seems to be sent regardless of how many resources the player has. So it would be necessary to do a bit of guessing and reverse engineering by looking later in the game at how many military oids the player actually used for commanding things around. If the unit attacks ground it is a mangonel or treb, and if it packs/unpacks it is a treb (there must be a command for that somewhere, though it isn't in the lists I've seen).

Monks aren't too much of an issue, because firstly, they are rarely that influential, and secondly, most of the time you'll be able to identify units multiple times. Most army is moved or whatever by the player multiple times. So if it gets converted, and is later moved by the other player then you will know that - you can actually track some of the conversions that way (I am assuming that the oid remains constant through a conversion?).

All of this opens up loads of stuff. If you can reliably detect villagers, then you know if they are sent forward. You can see them going to new gold piles etc etc. You can see trushes, castle drops. The only thing you don't know is if they die. Similarly, you can probably even pick up, say, specific trebuchets attacking a specific castle.

This way of doing it relies heavily on players actually commanding their units to move around, but I'm most interested in it for analysing expert games, where I suspect over 90% of all military units created will be manually moved at least once over the course of the game. And in expert games it is probably pretty easy to guess when units die. For example, 5 vills go forward, they build two towers and then... they are tasked to run home and then... they never appear in the mgz again. They are almost certainly dead. The same with military units, because expert players don't just leave units around doing nothing.

It still won't be perfect by any means, but it is more than nothing!

Thanks for the pointer to Bartender! My use case for this is that ideally I want to run a rec analyser on a headless server, which as far as I know pretty much rules out running the game itself without some serious modding, but I may be wrong about that?

happyleavesaoc commented 5 years ago

Here's my repo with my basic strategy detection, btw: https://github.com/happyleavesaoc/aoc-strats

Somewhat off topic, but the holy grail of rec analysis is a VM in the cloud with AoC installed and some sort of CPU acceleration so that the game plays out faster than normal, plus a memory-examining program to pull stats, unit oid mappings, etc. Then put an API on top of all that so you can POST a rec and get back a full raw data dump, which you could push through you own analytical pipeline. [edit: I know at least one guy that was trying this at one point. maybe he'll show up here]

bowswung commented 5 years ago

Thanks @happyleavesaoc ! And yeah that setup is exactly the kind of thing I mean, I was wondering if it was possible to run the game and pipe graphics output to a simulated display.

Do you think my ideas for obtaining more info from a rec parser have any mileage btw? I think there would be value if something like that, in the absence of the holy grail, as I don't think it would be an overwhelming task to carry out from a programming perspective. I'm considering putting together a first stab at it unless some killer issue becomes apparent, and you guys know more about this stuff than I do!

goto-bus-stop commented 5 years ago

@bowswung The housing issue @IamFlea mentions is a problem I think, because it invalidates the assumptions about unit creation times, and it can happen early in the game. Later in the game, it's also impossible to know unit creation times because it is affected by research, and researches can be aborted by a building being destroyed (which is impossible to know). Maybe you addressed this in your long post, I don't have time to read and process it right now but will try to on my bus ride home tonight :)

I did try to run AoC headless with Wine and xvfb here: https://github.com/goto-bus-stop/run-aoe-rms I basically abandoned that idea in favour of a different RMS parsing approach (reimplementing the parser), but I didn't find headless AoC super difficult. You could probably dockerize some Wine+xvfb based thing and do all kinds of analysis.

bowswung commented 5 years ago

Haha I did address that in my long post - quick summary would be that we can assign oids to a player whenever they move, build or primary act with them if I understand the commands correctly. (I'm abandoning the idea of guessing in advance, which I guess @IamFlea might have been responding to).

Awesome thanks for the other repo link. A memory inspector has been implemented somewhere right? From what I've been able to find out, that is what the Voobly spectator api is using.

goto-bus-stop commented 5 years ago

Bartender does that kind of memory inspection, my understanding is that it's a lot like the spectator overlay but for HD edition

bowswung commented 5 years ago

Yeah I think it is! Sadly I'm only really interested in UP/Voobly games as I want this for analysing top players, tournament recs etc, but maybe @IamFlea's awesome work could be translated to UP in principle?

IamFlea commented 5 years ago

@bowswung Oh, I see the algorithm now and it seems logical! This kind of approximation really didn't came up into my mind! :+1: And it is always possible to forward graphics output. There must be some screen forwarding like ssh -x for windows.

@happyleavesaoc Holy grail offtopic: I tried to increase game speed to really large number (60 game seconds per standard second) and it leads to desyncs in records. Indeed, CPU acceleration is really needed else you would get ~20 records per day.

@goto-bus-stop Yes, it was inspired by spectator overlay

@bowswung It is possible to make voobly version, however I would disable it for MP since they detect memory hooks.

happyleavesaoc commented 5 years ago

@bowswung if you check your Voobly PMs, I sent you a mod that will help out in some ways.

@goto-bus-stop that headless repo is super interesting. I think that's the way to go in the long term for this kind of work.

bowswung commented 5 years ago

@happyleavesaoc Awesome, thanks! At the very least it will be amazingly helpful for calibrating a rec parser simulation (rather than watching through and checking everything by hand!) I really appreciate it!

And I agree - the headless repo is definitely a good start - combined with a mod it would basically allow you to do anything, the only missing piece would be the CPU acceleration. If that was solved then there would be no need for an independent rec analyser (aside from some version detecting code). Is that at all realistic do you (or anyone else) think? I would be wanting, in principle, to be running analysis on thousands of games, which a rec parser could do fine.

@IamFlea I don't think we would need it to actually run on Voobly - UP and WK have an offline installation I think, so it could pretty much do anything you wanted!

Macuyiko commented 5 years ago

@happyleavesaoc I think I am "the guy" you were referring to. @bowswung has been in touch with me to discuss these ideas. For information, what I ended up doing with your Voobly mod can be read over at http://blog.macuyiko.com/post/2018/predicting-voobly-age-of-empires-2-matches.html

Basically, the problems boil down to headless mode and making the game simulate as fast as possible, taking into account different versions and so on.

IamFlea commented 5 years ago

@Macuyiko Thank you for such a great article! Tho, I haven't seen R for long time. Have you publish this on some conference? How did you speed up the game simulation?

happyleavesaoc commented 5 years ago

@Macuyiko actually no, but now it seems like there's actually quite a few people who have attempted this! Your article is great. Two quick notes ... 1) userpatch version is encoded in the rec, 2) score is just an equation using other dimensions (which are available individually) as inputs - it's interesting that ES crafted the best indicator! I actually compute the score for you in the mod, but maybe it would be better to send the components to better tune the prediction model.

Is anyone interested in starting an ongoing discussion around rec parsing/replay/metrics/storage/etc? This github issue is probably not an appropriate place. I think we could accomplish some interesting things if we worked together.

Macuyiko commented 5 years ago

@IamFlea Not published no -- I would have to expand the analysis somewhat more for this to be publishable (and find the right conference). Game speed was just ramped up as high as the game would allow. Being able to push it to go faster would be even more helpful.

There is no reason I did it in R specifically other than this language is what I'm most familiar with for rapid prototyping. You could do exactly the same with Python.

If I would do this over again, I'd definitely try to go for a dockerized setup using a virtual display and wine. I had to manually step in many times to resume the replays when a crash occurred or Voobly went offline.

@happyleavesaoc Ah, gotcha :). 1) I didn't know about that one, this would've been somewhat more helpful. Does this also apply to Voobly recordings, and do they also retain the mods used? I also had to set these (e.g. no walls mod), but if I would do it again I'd focus on just vanilla Userpatch games.

2) This I did know about. It would indeed be interesting to be able to use the individual inputs. I wouldn't say its the best indicator (other more specific features such as villagers created are also picked up), but it is a very good approximation.

I would be fun to get a more mature project started. My original drive behind this is because I watch quite a lot of casters and wanted to have a good probability of winning on top of the overlays used now.

bowswung commented 5 years ago

@happyleavesaoc Yeah I think that is a good idea. Here is a Discord server we could use: https://discord.gg/3enF3d4 but I'm not sure if Discord is the best place - maybe Slack or something would be better?

bowswung commented 5 years ago

I thought I'd post this here so it is a bit more public, and it is directly related to this issue.

I've been working on implementing my thoughts above, and it does seem to work out. I've managed to extract this: https://github.com/bowswung/voobly-scraper/blob/develop/test/simHistory from the rec game in this repo (test/recs/versions/1.4.mgz).

There are two types of events, (R)eal and (S)imulated. For the real events I've been limiting things to pretty well supported assumptions (i.e. they might not be precise, but they should be entirely accurate). The death simulations are just guesswork, and quite inaccurate at present.

I haven't actually watched this rec yet, so I have no idea how close my account is to reality! I may have made some mistakes here and there. But this is by no means the limit of what we could work out - adding in constraints like training time and building time would clarify things loads and wouldn't be too difficult. Referencing an updating map state would also help.

At that point, you could also start calculating resources, and you can get a pretty good idea of how many each player has on each resource as well (I haven't done that yet, but it should be fairly easy from what I already have).

bowswung commented 5 years ago

I should add that I am not using map position to infer the type or owner of any units as it isn't really necessary, and I don't like baking in assumptions that mean it is impossible to detect odd strategies. At some point the specific type of a military unit could probably be refined based on stuff like that, but train times would probably help more without needing to go beyond solid inferences.

coffenbacher commented 5 years ago

Awesome job @bowswung !! What do the pipes indicate, e.g. Archer|Skirmisher? Is that an or arising from uncertainty in the simplified simulation, or is it killed by one from a group of them?

bowswung commented 5 years ago

Thanks! Yeah that is an or. Each individual unit can end up with a different mix of possible types, due to the limitation of the inferences we can make, which is why you end up with odd things like Zuppi Moved 4 Archers, 6 Archer|Skirmishers, 6 Archer|Skirmisher|Spearmans, 2 Archer|Spearmans. We know there is an army of 18 units, just not exactly what all of them are. I'm pretty sure that just simulating train times and queues would resolve a lot of these.

goto-bus-stop / recanalyst

Detecting common strategies #17