visualization of the trained PPO mode

dedellzyw commented 1 month ago

Excellent work and paper! One question is, during the training of PPO or SAC, evaluations of the model are conducted. Is there any corresponding visualization of the trained PPO model? Thank you.

zilin-huang commented 1 month ago

Hi, the RL and safe RL evaluation results will be saved in the run_baselines/[ppo/sac/sac_lag/ppo_lag] folder. You can open it with tensorboard.

For example: tensorboard --logdir=. --port=8080

dedellzyw commented 1 month ago

Hi, the RL and safe RL evaluation results will be saved in the run_baselines/[ppo/sac/sac_lag/ppo_lag] folder. You can open it with tensorboard.

For example: tensorboard --logdir=. --port=8080

Thank you for your response. What I would like to ask about is the visualization effects of the trained model in the simulator.

zilin-huang commented 1 month ago

Hi, if u want to visualize each episode, you can try to set "use_render=True" in config.py. Otherwise, if you want to import the model from checkpoint and then visualize the evaluation process, you can run the following demo code. Note: need to change the path.

import os
import ray
from ray.rllib.agents.ppo import PPOTrainer
from pe_rlhf.utils.human_in_the_loop_env import HumanInTheLoopEnv

def visualize_trained_model(exp_path, ckpt_idx):
    ray.init(ignore_reinit_error=True)

    # Construct the checkpoint path
    ckpt = os.path.join(exp_path, f"checkpoint_{ckpt_idx}", f"checkpoint-{ckpt_idx}")

    # Initialize the PPO trainer with the environment
    trainer = PPOTrainer(env=HumanInTheLoopEnv)

    # Restore the trained model
    trainer.restore(ckpt)

    # Configure the environment for rendering
    env_config = {
        "manual_control": True,
        "use_render": True,
        "controller": "keyboard",
        "window_size": (1600, 1100),
    }
    env = HumanInTheLoopEnv(env_config)

    # Reset the environment
    obs = env.reset()

    done = False
    while not done:
        # Compute actions using the trained model
        action = trainer.compute_action(obs)

        # Step the environment
        obs, reward, done, info = env.step(action)

        # Render the environment
        env.render()

    # Close the environment
    env.close()

if __name__ == '__main__':
    # Example usage
    exp_path = '/home/sky-lab/codes/PE-RLHF/pe_rlhf/run_baselines/PPO/PPO_HumanInTheLoopEnv_ce692_00004_4_seed=400_2024-06-11_17-49-05'
    ckpt_idx = 209  # Specify the checkpoint index
    visualize_trained_model(exp_path, ckpt_idx)

dedellzyw commented 1 month ago

Thank you for your answer！

zilin-huang commented 1 month ago

Haha, feel free to cite our paper if you find our project helpful in your task: https://arxiv.org/abs/2409.00858

dedellzyw commented 1 month ago

Haha, feel free to cite our paper if you find our project helpful in your task: https://arxiv.org/abs/2409.00858

Of course, your paper is excellent and has been very helpful to me. I've recently been trying to apply reinforcement learning in environments built on world models for self-driving trajectory prediction. Have you tried anything in this direction?

zilin-huang commented 1 month ago

Hi, I have no experience in the area of world model. Yes, I am recently doing some work on VLM with RL on Carla simulator. If this aspect can help you, feel free to reach out and we absolutely can work something together~

dedellzyw commented 1 month ago

Sure, I'll send my contact information to your email.

zilin-huang / PE-RLHF

visualization of the trained PPO mode #1