lifrordi / DeepStack-Leduc

Example implementation of the DeepStack algorithm for no-limit Leduc poker
https://www.deepstack.ai/
878 stars 211 forks source link

Alternative NN architecture gave curious(questionable?), results #17

Open snarb opened 6 years ago

snarb commented 6 years ago

Hello, Martin Want to share one observation and ask a question. I have built an alternative C++ implementation basing on your ideas with some optimizations. Among it, I am using other NN architecture implementation and get an interesting result in one case. I have used one million trained data points generated with CFR for 1,000 iterations as you. But got in the end exploitability 0.4 or something like that. That is less than ~1. exploitability for the CFR 1000(one training sample). How do you think is this possible or this is a bug in my exploitability calculation( this is possible)? In theory, how do you think can a lot of samples reduce individual sample exploitability variance? I have tested this version VS this repository version and looks like it wins on average.


Also, I have a question, why are you estimating CF values with NN at the start of the new round for every next board, why not at the end of the current round? Because of accuracy considerations or because of the fact that we need to generate much more training cases for this? This should give us big speed improvement after. Did you test this setup? Thanks.

bobbiesbob commented 6 years ago

In the paper, they mentioned they used auxiliary network for preflop which estimates value at the end of current round. So they tested the setup for preflop at least. I'm also interested whether they tested for postflop rounds. My guess is no, since it is already fast enough to play postflop even with enumerating all possible turns.

JaysenStark commented 6 years ago

@snarb May I ask that how much computation resource did you use, how long does it take to generate the training data. Did you use leduc poker or texas hold'em?