opendilab / LightZero

[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)
https://huggingface.co/spaces/OpenDILabCommunity/ZeroPal
Apache License 2.0
1.15k stars 120 forks source link

Stochastic MuZero MLP Issues Related to Chance Space #283

Closed ShivamKumar2002 closed 1 month ago

ShivamKumar2002 commented 1 month ago

When I try to train a model using stochastic muzero MLP with chance encoder, I am getting error related to indexing at this line: https://github.com/opendilab/LightZero/blob/f1511fb5cdda4d31e61c5831c877d36215b49b22/lzero/model/stochastic_muzero_model_mlp.py#L172

This is because _dynamics receives chance as input, so the size considered while encoding should be self.chance_space_size, as in the conv model.

When I fixed that, I faced another issue of matrix size mismatch, which is caused by wrong dimension in this line: https://github.com/opendilab/LightZero/blob/f1511fb5cdda4d31e61c5831c877d36215b49b22/lzero/model/stochastic_muzero_model_mlp.py#L220

To fix this, I changed dimension passed to dynamics_network initialisation to self.chance_space_size here: https://github.com/opendilab/LightZero/blob/f1511fb5cdda4d31e61c5831c877d36215b49b22/lzero/model/stochastic_muzero_model_mlp.py#L116

After this, I faced another issue of matrix size mismatch, which is caused by using dynamics_network in _afterstate_dynamics function. In conv model, it uses afterstate_dynamics_network. Link: https://github.com/opendilab/LightZero/blob/f1511fb5cdda4d31e61c5831c877d36215b49b22/lzero/model/stochastic_muzero_model_mlp.py#L277

So I also changed this line to use afterstate_dynamics_network.

After making these changes, model is training and loss is going down and I didn't face any other error.

I think there should be either a new variable for chance_encoding_dim or chance encoding should be fixed to either one-hot or not one-hot in _dynamics function.

Please confirm whether this is correct way to fix or did I miss anything?

I will be glad to submit a PR with these changes if required.

puyuan1996 commented 1 month ago

Hello, the stochastic_muzero_model_mlp.py has not undergone full validation and may indeed have the issues you mentioned. We would greatly appreciate it if you could provide examples of the model's application in specific environments, along with the relevant fix code. You can submit a PR to facilitate our further analysis of the current issue. Thank you very much for your support!