Closed dedellzyw closed 3 weeks ago
Hi, the RL and safe RL evaluation results will be saved in the run_baselines/[ppo/sac/sac_lag/ppo_lag] folder. You can open it with tensorboard.
For example: tensorboard --logdir=. --port=8080
Hi, the RL and safe RL evaluation results will be saved in the run_baselines/[ppo/sac/sac_lag/ppo_lag] folder. You can open it with tensorboard.
For example: tensorboard --logdir=. --port=8080
Thank you for your response. What I would like to ask about is the visualization effects of the trained model in the simulator.
Hi, if u want to visualize each episode, you can try to set "use_render=True" in config.py. Otherwise, if you want to import the model from checkpoint and then visualize the evaluation process, you can run the following demo code. Note: need to change the path.
import os
import ray
from ray.rllib.agents.ppo import PPOTrainer
from pe_rlhf.utils.human_in_the_loop_env import HumanInTheLoopEnv
def visualize_trained_model(exp_path, ckpt_idx):
ray.init(ignore_reinit_error=True)
# Construct the checkpoint path
ckpt = os.path.join(exp_path, f"checkpoint_{ckpt_idx}", f"checkpoint-{ckpt_idx}")
# Initialize the PPO trainer with the environment
trainer = PPOTrainer(env=HumanInTheLoopEnv)
# Restore the trained model
trainer.restore(ckpt)
# Configure the environment for rendering
env_config = {
"manual_control": True,
"use_render": True,
"controller": "keyboard",
"window_size": (1600, 1100),
}
env = HumanInTheLoopEnv(env_config)
# Reset the environment
obs = env.reset()
done = False
while not done:
# Compute actions using the trained model
action = trainer.compute_action(obs)
# Step the environment
obs, reward, done, info = env.step(action)
# Render the environment
env.render()
# Close the environment
env.close()
if __name__ == '__main__':
# Example usage
exp_path = '/home/sky-lab/codes/PE-RLHF/pe_rlhf/run_baselines/PPO/PPO_HumanInTheLoopEnv_ce692_00004_4_seed=400_2024-06-11_17-49-05'
ckpt_idx = 209 # Specify the checkpoint index
visualize_trained_model(exp_path, ckpt_idx)
Thank you for your answer!
Haha, feel free to cite our paper if you find our project helpful in your task: https://arxiv.org/abs/2409.00858
Haha, feel free to cite our paper if you find our project helpful in your task: https://arxiv.org/abs/2409.00858
Of course, your paper is excellent and has been very helpful to me. I've recently been trying to apply reinforcement learning in environments built on world models for self-driving trajectory prediction. Have you tried anything in this direction?
Hi, I have no experience in the area of world model. Yes, I am recently doing some work on VLM with RL on Carla simulator. If this aspect can help you, feel free to reach out and we absolutely can work something together~
Sure, I'll send my contact information to your email.
Excellent work and paper! One question is, during the training of PPO or SAC, evaluations of the model are conducted. Is there any corresponding visualization of the trained PPO model? Thank you.