Stochastic MuZero MLP Issues Related to Chance Space

When I try to train a model using stochastic muzero MLP with chance encoder, I am getting error related to indexing at this line: https://github.com/opendilab/LightZero/blob/f1511fb5cdda4d31e61c5831c877d36215b49b22/lzero/model/stochastic_muzero_model_mlp.py#L172

This is because _dynamics receives chance as input, so the size considered while encoding should be self.chance_space_size, as in the conv model.

When I fixed that, I faced another issue of matrix size mismatch, which is caused by wrong dimension in this line: https://github.com/opendilab/LightZero/blob/f1511fb5cdda4d31e61c5831c877d36215b49b22/lzero/model/stochastic_muzero_model_mlp.py#L220

To fix this, I changed dimension passed to dynamics_network initialisation to self.chance_space_size here: https://github.com/opendilab/LightZero/blob/f1511fb5cdda4d31e61c5831c877d36215b49b22/lzero/model/stochastic_muzero_model_mlp.py#L116

After this, I faced another issue of matrix size mismatch, which is caused by using dynamics_network in _afterstate_dynamics function. In conv model, it uses afterstate_dynamics_network. Link: https://github.com/opendilab/LightZero/blob/f1511fb5cdda4d31e61c5831c877d36215b49b22/lzero/model/stochastic_muzero_model_mlp.py#L277

So I also changed this line to use afterstate_dynamics_network.

After making these changes, model is training and loss is going down and I didn't face any other error.

I think there should be either a new variable for chance_encoding_dim or chance encoding should be fixed to either one-hot or not one-hot in _dynamics function.

Please confirm whether this is correct way to fix or did I miss anything?

I will be glad to submit a PR with these changes if required.

opendilab / LightZero

Stochastic MuZero MLP Issues Related to Chance Space #283