lifrordi / DeepStack-Leduc

Example implementation of the DeepStack algorithm for no-limit Leduc poker
https://www.deepstack.ai/
878 stars 211 forks source link

A question about masked_huber_loss #34

Closed zhoujz10 closed 6 years ago

zhoujz10 commented 6 years ago

Hi @lifrordi line 57 in masked_huber_loss.lua says that mask is 1 for impossible features, but I think mask should be 0 for impossible features, according to the code in bucker.lua.

This line local loss_multiplier = (batch_size * feature_size) / (batch_size * feature_size - self.mask_sum:sum() ) only makes sense when mask_sum is the sum of the count of impossible features (which are most of the features for one specific sample), right? but when I run this code, I found that self.mask_sum is the sum of the count of valid buckets.

card_count = 6 board_card_count = 1 bucket_count = 36 for one specific case, valid_bucket_count = 5 And my question is, should self.mask_sum be 5 or 31? Thank you very much!

DWingHKL commented 6 years ago

By the way, what codes do you use in generated 4M training samples for the turn network, I also have same problem with loss, but I was fixed it. and loss is same as the paper said.

zhoujz10 commented 6 years ago

@DWingHKL Hi, I use a cluster with 2000 cores and several GPUs to do that. I solved the game from the turn card is dealt to the end of the game, same as the author did.

Can you share with me how to fix the problem with loss? Thank you so much!

zhoujz10 commented 6 years ago

@DWingHKL For one specific turn board, there is usually around 100 valid buckets, right? (Total bucket count is 1000). Is it because we should average the loss over the 1000 buckets so the loss is around 0.02, or is it because I ignored something important?

DWingHKL commented 6 years ago

When you generated turn samples you are using river network ? I think the most is have an bug in your code.

DWingHKL commented 6 years ago

By the way, what the loss of the river network ?

zhoujz10 commented 6 years ago

@DWingHKL Hi, I didn't use a river network, instead, I solve the turn situation directly to the end of the game(using lookahead and re-solving).

Maybe there is a bug, but I can't find one. Can you share with me your experience of fixing the loss problem? Maybe I'm facing the same problem.

Thank you!

DWingHKL commented 6 years ago

I was fixed the bug in my lookahead process. Does you means that you use CFR-D for turn situation?

2018-08-08 13:15 GMT+08:00 Allen notifications@github.com:

@DWingHKL https://github.com/DWingHKL Hi, I didn't use a river network, instead, I solve a turn situation using directly to the end of the game (using lookahead and re-solving).

Maybe there is a bug, but I can't find one. Can you share with me your experience of fixing the loss problem? Maybe I'm facing the same problem.

Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lifrordi/DeepStack-Leduc/issues/34#issuecomment-411287858, or mute the thread https://github.com/notifications/unsubscribe-auth/AlgIbfERUkowhnixSznaZOpehmS8afPRks5uOnP8gaJpZM4Vx4TW .

zhoujz10 commented 6 years ago

@DWingHKL No, I didn't use CFR-D. It seems that cfr-d is used to generate opponent ranges, and this is only necessary for continual re-solving. When generating train samples, I just use CFR which is the same as the code in the author's repo.

DWingHKL commented 6 years ago

For one turn situation , for example need 1000 iterations, in each iteartion, you solve turn and run river 1000 iteration ?

DWingHKL commented 6 years ago

what is you email?

zhoujz10 commented 6 years ago

@DWingHKL yes, exactly. I check my result by solving same turn situations using the code in tree_cfr.lua, the results are the same. my email is zhoujz10#163.com.