opendilab / LightZero

[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)
https://huggingface.co/spaces/OpenDILabCommunity/ZeroPal
Apache License 2.0
1.13k stars 119 forks source link

Lacking inference script #62

Closed samkoesnadi closed 1 year ago

samkoesnadi commented 1 year ago

In the codebase, there are training and evaluation scripts. This is great. But, I lack an inference script here, in which I can run the existing weights on the environment and see how it performs visually. To have the environment rendered visually and see the AI runs is a good addition. Is there already a plan to do this?

puyuan1996 commented 1 year ago

Hello,

Best regards.

samkoesnadi commented 1 year ago

Hello,

  • I appreciate your interest in our project. Model inference and evaluation actually have many similarities. For real-time game rendering during Atari agent training, loading trained model weights for inference, and saving game replay as an MP4 video, solutions have been provided in this issue. Please refer to the discussion for further information.
  • Should you have additional requirements, feel free to share them. We strive to accommodate these in our future updates.

Best regards.

Thanks for the prompt reply puyuan, appreciate it. I am building my own real-life robot now, and will use LightZero as the brain of it. Any thoughts on it?

I will take a further look into the inferencing of the model. Also, will try to integrate my own environment to lightzero, as I design my own robot. If there is any guidance you have for integrating our own custom environment, I would be very happy. Thank you in advance, puyuan!

puyuan1996 commented 1 year ago

Apologies for our delayed response. I appreciate your consideration of LightZero as the "brain" of your real-life robot. Given the excellent performance of the MCTS series of algorithms across various domains, we are confident that LightZero can play a vital role in your robot control system.

Here are some important considerations for applying LightZero-supported MCTS series algorithms to a real-life robot environment:

We are currently in the process of writing documentation on how to integrate the custom environments into LightZero. Until the documentation is available, I suggest you first refer to OpenAI's guide on creating custom Gym environments. Then, use lightzero_env_wrapper to convert gym env into the env format required by LightZero. For information on how to use your custom environment within the main muzero process, please refer to this entry file and this config file.

I hope these points are helpful. We'd love to provide support for LightZero in your project, so if you need further clarification or assistance, please feel free to contact us.

Best of luck with your project. We're excited to see your robot in action!

Best regards.

samkoesnadi commented 1 year ago

Apologies for our delayed response. I appreciate your consideration of LightZero as the "brain" of your real-life robot. Given the excellent performance of the MCTS series of algorithms across various domains, we are confident that LightZero can play a vital role in your robot control system.

Here are some important considerations for applying LightZero-supported MCTS series algorithms to a real-life robot environment:

  • General Pipeline

    • Due to the low sample efficiency issue with RL algorithms, I recommend training a model in a simulated robot environment first. Alternatively, collect a batch of offline datasets from the real environment and use an offline algorithm like Unplugged MuZero to train the model. Then fine-tune the model in the real environment, following the sim2real approach.
  • Environment Definition

    • Define a reasonable state space. Ensure the state representation contains all the information necessary for the agent to execute optimal actions and is as compact as possible.
    • Define a reasonable action space. If it's a continuous action space, consider discretizing it based on the specific semantics of its actions. This can significantly reduce the search burden and improve efficiency. However, if the dimension of the action space remains large after discretization, we'd recommend using a Gaussian distribution for policy modeling, which has been proven to be more efficient (please refer to lightzero benchmark). Please note that the hyperparameters and other implementation details of the continuous action space algorithm still need further adjustments to achieve optimal performance.
  • Algorithm Selection

    • The Sampled MuZero algorithm has already shown good performance in simulated robot environments. In LightZero, we've implemented an improved version, the Sampled EfficientZero algorithm. You may use this as a baseline algorithm and update it based on the actual conditions of your robot environment.
  • Robustness and Generalization

    • Issues of robustness and generalization in real-life robots are also factors you need to consider.

We are currently in the process of writing documentation on how to integrate the custom environments into LightZero. Until the documentation is available, I suggest you first refer to OpenAI's guide on creating custom Gym environments. Then, use lightzero_env_wrapper to convert gym env into the env format required by LightZero. For information on how to use your custom environment within the main muzero process, please refer to this entry file and this config file.

I hope these points are helpful. We'd love to provide support for LightZero in your project, so if you need further clarification or assistance, please feel free to contact us.

Best of luck with your project. We're excited to see your robot in action!

Best regards.

Thank you for the lengthy and elaborative lead, will proceed with it and let you know the progress in the future :)