pandaant / poker-cfrm

A NLTH Poker Agent using Counterfactual Regret Minimization
47 stars 18 forks source link

MIXED_XXXX abstraction head to head win averages are in unexpected order #4

Closed shantanukarve closed 3 years ago

shantanukarve commented 4 years ago

I ran some tests of holdem, nolimit, 2player, 1|2 small blind |bigblind, 200|200 stack, maxRaises of 3 4 4 4 games. During cluster abstractions runs for all tests I kept the nb-samples to "0,2,500,500", the buckets to : 169,5,10,500, the error bounds to : .01,.01,.01,.01, the nb-hist-samples-per-round to 0,1,200,200. For all tests I held the action-abstraction to polrelative at 0.4,0.8,1.2,2,5,9999 raises. For cfr learning I had 12 threads and times of 8 hours and sometimes 16 hrs and 24 hrs.

I ran the head to heads, specifically NSSS against each of the NOOO, NEES, NEEO. I expected NSSS to perform the wost, meaning lose money, i.e. negative average wins and NEEO to be best. I'm getting NSSS to be the best ! Here's a table of results. As you can see I ran cfr's learning phase for the most sophisticated strategy, NEEO, for longer and longer times, so 8 hrs then 16 hours then 24 hours but that didn't change things. Any ideas of what to experiment with to get the results to align with expectations - meaning NEEO, NEES, NOOO to be all better than NSSS. Update: thinking harder, I'm wondering if the clustering abstraction is too coarse so I need to increase the fineness, by increasing the nb-samples and the nb-hist-samples. Any ideas on combinatrics around this to see what's appropriate ?

Abs cfrm runtime (secs / 1000) Abs cfrm runtime (secs / 1000) Avg win Var num games seed median win
NSSS 28 NOOO 28 2.94 7167 500000 7534 0

28 NOOO 28 2.76 7119 100000 3575 0

28 NOOO 28 2.95 7163 100000 8379 1.5










28 NEES 28 3.4 7564 100000 8379 1.5

28 NEES 28 3.03 3575 100000 3575 1.5










28 NEEO 28 5.07 7475 100000 8379 1.5

28 NEEO 57 4.53 7118 100000 7534 1.5

28 NEEO 86 5.05 7141 100000 7534 1.5

28 NEEO 86 4.84 7138 100000 8370 1.5
pandaant commented 3 years ago

Sorry for the super late response.

Event though it's probably too late i just want to say that it's been a long time i worked on this project and i'm not too familiar with the specifics anymore. From what i see the update to your post is one reason. Accurate calculations for the flop are expensive and are kept very low in the example configuration file. The cluster abstraction only uses two buckets on the flop also. Which is a very low number.