opendilab / LightZero

[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)
https://huggingface.co/spaces/OpenDILabCommunity/ZeroPal
Apache License 2.0
1.09k stars 111 forks source link

LightZero on HPC and other questions #207

Closed selfsim closed 6 months ago

selfsim commented 6 months ago

Hello, I am considering LightZero as my workhorse for MBRL research. I have a few questions, feel free to just link to relevant file(s) and I can parse it myself.

Thank you for this contribution and I look forward to hearing from you soon.

edit: read some of paper, questions answered.

puyuan1996 commented 6 months ago

Hello, we currently support multi-GPU training on a single node through PyTorch's Distributed Data Parallel (DDP) technology. Please refer to the discussion on this issue: https://github.com/opendilab/LightZero/issues/196. As for multi-node training, we plan to consider incorporating this functionality in future version updates.

At present, we have integrated experimental monitoring and performance analysis support that is supported by DI-engine into LightZero. For detailed information, please consult this document (https://github.com/opendilab/DI-engine-docs/blob/main/source/04_best_practice/training_generated_folders_zh.rst, Chinese version). For your convenience, we have provided the following English summary and will subsequently integrate it fully into our codebase documentation. Thank you for your suggestion. Best regards.

puyuan1996 commented 6 months ago

Experimental monitoring and logging system in LightZero

LightZero generates log and checkpoint folders during the training process. The file tree generated is as follows:

cartpole_muzero
├── ckpt
│   ├── ckpt_best.pth.tar
│   ├── iteration_0.pth.tar
│   └── iteration_10000.pth.tar
├── log
│   ├── buffer
│   │   └── buffer_logger.txt
│   ├── collector
│   │   └── collector_logger.txt
│   ├── evaluator
│   │   └── evaluator_logger.txt
│   ├── learner
│   │   └── learner_logger.txt
│   └── serial
│       └── events.out.tfevents.1626453528.CN0014009700M.local
├── formatted_total_config.py
└── total_config.py

log/collector

In the collector folder, there is a file named collector_logger.txt, which contains information related to the interaction between the collector and the environment. Special information generated when the collector interacts with the environment, such as:

log/evaluator

In the evaluator folder, there is a file named evaluator_logger.txt, which contains information about the evaluator's interaction with the environment.

log/learner

In the learner folder, there is a file named learner_logger.txt, which contains information about the learner. The following information is generated during the MuZero training period:

Policy neural network architecture:

[04-08 13:12:59] INFO     [RANK0]: DI-engine DRL Policy                                                                                                base_learner.py:338
                          MuZeroModelMLP(                                                                                                                                 
                            (representation_network): RepresentationNetworkMLP(                                                                                           
                              (fc_representation): Sequential(                                                                                                            
                                (0): Linear(in_features=4, out_features=128, bias=True)                                                                                   
                                (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)                                                     
                                (2): ReLU(inplace=True)                                                                                                                   
                                (3): Linear(in_features=128, out_features=128, bias=True)                                                                                 
                              )                                                                                                                                                                                                                                                                                

Learner information:
    Grid table:
    | Name  | cur_lr_avg | total_loss_avg |
    |-------|------------|----------------|
    | Value | 0.001000   | 0.098996       |

log/serial

The buffer, collector, evaluator, and learner's relevant information is saved into a file named events.out.tfevents for use with tensorboard.

LightZero saves all tensorboard files from the serial folder as one tensorboard file, rather than individual folders. This is because when running a large number of experiments, say n, it is not easy to distinguish between 4*n individual tensorboard files. Therefore, in LightZero, all tensorboard files are in the serial folder.

ckpt

In the ckpt folder, there are model parameter checkpoints:

selfsim commented 6 months ago

Thanks for the information.