opendilab / LightZero

[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)
https://huggingface.co/spaces/OpenDILabCommunity/ZeroPal
Apache License 2.0
1.04k stars 108 forks source link

Inquiry about Commented Out Loss Calculations in UniZero Implementation #257

Closed Tiikara closed 2 weeks ago

Tiikara commented 1 month ago

While examining the source code for UniZero calculations in the master branch, I noticed that the latent_recon_loss and perceptual_loss computations are currently commented out. I'm curious whether this was an intentional decision or possibly an oversight.

https://github.com/opendilab/LightZero/blob/c004eac9b44b44c01e992f2f153225f3da2a4cd8/lzero/model/unizero_world_models/world_model.py#L881-L901

If this was deliberate, could you please explain the reasoning behind it? It seems unusual for the neural network to train on random tokenizer weights without these loss components. I'd appreciate any insights into this design choice and its implications for the network's learning process.

Thank you for your time and for maintaining this valuable resource.

Tiikara commented 1 month ago

I understand that the training of the representation network occurs through predicting the next latent state and calculating the error for this predicted state. Is my understanding correct?

I'm still curious about the use of latent_recon_loss and perceptual_loss. Does the inclusion of these loss terms negatively impact the network's predictive capabilities? My assumption is that one potential drawback could be that instead of learning to play the game, the network might focus on reconstructing the image and memorizing features that aren't directly relevant to gameplay. Is this a fair assessment?

I would appreciate your insights on these aspects of the network's training process.

puyuan1996 commented 1 month ago

Hello, thank you very much for your question. Indeed, in our current UniZero version, we do not use latent_recon_loss and perceptual_loss, and this decision was made after careful consideration. Our experiments and intuitive analysis revealed that similar reconstruction losses were not necessary for the final performance, which is reflected in Figure 6(e) of the paper. We believe that the learned representations only need to contain the most relevant information for decision-making, meaning they should better predict policy, value, next_latent_state, and reward, without needing to reconstruct the image (of course, this would require further discussion if it were a generative task). Our encoder/representation network is jointly trained using equation (3) from the paper. Best wishes!