microRTS: how frequently is the evaluation function actually being used?

schrum2 commented 7 years ago

It occurs to me that I'm not entirely certain how/when some of the policies you have identified actually use an evaluation function. If the evolved evaluation function is being used seldom, then it won't have a strong impact on behavior.

Alternately, it could be the case that combining a strong evolved evaluation function with the agents you have identified would be useful, but that evolving the evaluation function in the MLPSMCTS and Portfolio agents is ineffective ... an agent that uses the evaluation function more might evolve a better one, even if the agent itself is weaker. We could transfer the evaluation function to a different agent later.

Regardless, we need to find out exactly how often the evolved evaluation function is being used. You should carefully read the papers that describe these agents, and report on how often they use the evaluation functions in this GitHub issue. Additionally, you should look at the code and indicate when each of the agents we are focusing on actually uses the evaluation function.

Another concrete and useful step you can take is to create an "other" score that tracks how many times the evaluation function is actually used in a trial. This can be tracked with a variable inside of the NN eval function class that resets to 0 when a new network is inserted, and increments whenever an evaluation score is asked for. It would be interesting to see if different agents are using the eval function a drastically different number of times ... however, you should probably divide this number by the number of cycles in the match, since longer matches will naturally use the eval function more. The result will not be a percentage though, since hopefully the agent is using the eval function multiple times per cycle. I guess the result will track the average number of times per cycle that the eval function is used.

Start with this, and I'll add more to this issue as I think of more ways of diagnosing this.

alicequint commented 7 years ago

i havent read the papers yet but based on running the code with print statements:

MLPSMCTS evaluates 100-200 game states on cycle 0 and then 1000 (seems to be gradually increasing) game states every following 10 cycles

Portfolio evaluates 100-200 game states on cycle 0 and then ~700 game states every following 10 cycles

PuppetSearchAB starts similarly but then evaluates 1300+ game states Every Cycle from then on

PupperSearchMCTS starts with ~100 and then evaluates 500-600 Every Cycle from then on

UCT starts with 200-250 and then does 700-900 with every 10 subsequent cycles

schrum2 commented 7 years ago

Looked at the code. I feel that some issues had to be fixed. See me upcoming commit.

schrum2 / MM-NEATv2

microRTS: how frequently is the evaluation function actually being used? #383