lifrordi / DeepStack-Leduc

Example implementation of the DeepStack algorithm for no-limit Leduc poker
https://www.deepstack.ai/
878 stars 211 forks source link

Idea to improve training #32

Open happypepper opened 6 years ago

happypepper commented 6 years ago

Right now DeepStack is using masked huber loss to compute the loss where the bucket is given weight 0 if impossible and 1 if possible. What if we changed the mask so it can be any value between 0 and 1 weighted by how likely that bucket is?

So if there are 2 buckets A and B that both have error of 0.5, but bucket A has range probability 0.01, and bucket B has probability 0.0001, it would give 100x more importance to updating bucket A's CFV to become closer to its target.