aijunzhao commented 1 year ago

(SnakeAI) E:\snake-ai-master\main>python train_cnn.py Using cuda device Wrapping the env in a VecTransposeImage. Process SpawnProcess-5: Traceback (most recent call last): File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\multiprocessing\process.py", line 315, in _bootstrap self.run() File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\multiprocessing\process.py", line 108, in run self._target(*self._args, **self._kwargs) File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\stable_baselines3\common\vec_env\subproc_vec_env.py", line 30, in _worker observation, reward, done, info = env.step(data) File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\stable_baselines3\common\monitor.py", line 95, in step observation, reward, done, info = self.env.step(action) File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\gym\core.py", line 289, in step return self.env.step(action) File "E:\snake-ai-master\main\snake_game_custom_wrapper_cnn.py", line 47, in step self.done, info = self.game.step(action) # info = {"snake_size": int, "snake_head_pos": np.array, "prev_snake_head_pos": np.array, "food_pos": np.array, "food_obtained": bool} File "E:\snake-ai-master\main\snake_game.py", line 96, in step self.sound_game_over.play() AttributeError: 'SnakeGame' object has no attribute 'sound_game_over' Traceback (most recent call last): File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\multiprocessing\connection.py", line 312, in _recv_bytes nread, err = ov.GetOverlappedResult(True) BrokenPipeError: [WinError 109] 管道已结束。

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train_cnn.py", line 95, in main() File "train_cnn.py", line 82, in main model.learn( File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\sb3_contrib\ppo_mask\ppo_mask.py", line 525, in learn continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, self.n_steps, use_masking) File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\sb3_contrib\ppo_mask\ppo_mask.py", line 305, in collect_rollouts new_obs, rewards, dones, infos = env.step(actions) File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\stable_baselines3\common\vec_env\base_vec_env.py", line 163, in step return self.step_wait() File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\stable_baselines3\common\vec_env\vec_transpose.py", line 95, in step_wait observations, rewards, dones, infos = self.venv.step_wait() File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\stable_baselines3\common\vec_env\subproc_vec_env.py", line 121, in step_wait results = [remote.recv() for remote in self.remotes] File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\site-packages\stable_baselines3\common\vec_env\subproc_vec_env.py", line 121, in results = [remote.recv() for remote in self.remotes] File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\multiprocessing\connection.py", line 250, in recv buf = self._recv_bytes() File "C:\Users\KEN2020.conda\envs\SnakeAI\lib\multiprocessing\connection.py", line 321, in _recv_bytes raise EOFError EOFError

Han-duoduo commented 1 year ago

+1 求解决

Chapoii commented 1 year ago

播放声音前面加一个判断是否是silent_mode，训练的时候不需要播放声音

BeiYining commented 1 year ago

具体位置是:snake_game.py -- line 95 左右的位置

shironghe commented 1 year ago

sound_game_over 不会影响我们训练模型，可以注释掉self.sound_game_over.play()再添加pass，等玩test时再打开

aijunzhao commented 1 year ago

sound_game_over 不会影响我们训练模型，可以注释掉self.sound_game_over.play()再添加pass，等玩test时再打开

可以了，感谢兄弟

1816705 commented 1 year ago

为什么我卡在这

shironghe commented 1 year ago

mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了

在 2023-05-26 13:09:31，"wave" @.***> 写道：

为什么我卡在这

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

zjhcwjb commented 1 year ago

mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

请问怎么让训练过程可视化呢

shironghe commented 1 year ago

训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化

在 2023-05-31 02:01:11，"zjhcwjb" @.***> 写道：

mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

请问怎么让训练过程可视化呢

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

zjhcwjb commented 1 year ago

训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊

shironghe commented 1 year ago

你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染

在 2023-05-31 13:33:44，"zjhcwjb" @.***> 写道：

训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

zjhcwjb commented 1 year ago

你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染在 2023-05-31 13:33:44，"zjhcwjb" @.> 写道：训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 已经加上渲染画面的代码了运行起来gpu占用也很高应该是在训练但还是看不到画面啊

shironghe commented 1 year ago

可以的话，共享你的代码

在 2023-05-31 14:34:13，"zjhcwjb" @.***> 写道：

你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染在 2023-05-31 13:33:44，"zjhcwjb" @.> 写道：训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 已经加上渲染画面的代码了运行起来gpu占用也很高应该是在训练但还是看不到画面啊

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

zjhcwjb commented 1 year ago

可以的话，共享你的代码在 2023-05-31 14:34:13，"zjhcwjb" @.> 写道：你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染在 2023-05-31 13:33:44，"zjhcwjb" @.> 写道：训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 已经加上渲染画面的代码了运行起来gpu占用也很高应该是在训练但还是看不到画面啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

import os import sys import random import time

from stable_baselines3.common.monitor import Monitor from stable_baselines3.common.vec_env import SubprocVecEnv from stable_baselines3.common.callbacks import CheckpointCallback from sb3_contrib import MaskablePPO from sb3_contrib.common.wrappers import ActionMasker

from snake_game_custom_wrapper_mlp import SnakeEnv

NUM_ENV = 32 LOG_DIR = "logs" os.makedirs(LOG_DIR, exist_ok=True)

Linear scheduler

def linear_schedule(initial_value, final_value=0.0):

if isinstance(initial_value, str):
    initial_value = float(initial_value)
    final_value = float(final_value)
    assert (initial_value > 0.0)

def scheduler(progress):
    return final_value + progress * (initial_value - final_value)

return scheduler

def make_env(seed=0): def _init(): env = SnakeEnv(seed=seed) env = ActionMasker(env, SnakeEnv.get_action_mask) env = Monitor(env) env.seed(seed) return env return _init

def main():

# Generate a list of random seeds for each environment.
seed_set = set()
while len(seed_set) < NUM_ENV:
    seed_set.add(random.randint(0, 1e9))

# Create the Snake environment.
env = SubprocVecEnv([make_env(seed=s) for s in seed_set])

lr_schedule = linear_schedule(2.5e-4, 2.5e-6)
clip_range_schedule = linear_schedule(0.15, 0.025)

# # Instantiate a PPO agent
model = MaskablePPO(
    "MlpPolicy",
    env,
    device="cuda",
    verbose=1,
    n_steps=2048,
    batch_size=512,
    n_epochs=4,
    gamma=0.94,
    learning_rate=lr_schedule,
    clip_range=clip_range_schedule,
    tensorboard_log=LOG_DIR
)

# Set the save directory
save_dir = "trained_models_mlp"
os.makedirs(save_dir, exist_ok=True)

checkpoint_interval = 15625 # checkpoint_interval * num_envs = total_steps_per_checkpoint
checkpoint_callback = CheckpointCallback(save_freq=checkpoint_interval, save_path=save_dir, name_prefix="ppo_snake")

# Writing the training logs from stdout to a file
original_stdout = sys.stdout
log_file_path = os.path.join(save_dir, "training_log.txt")
with open(log_file_path, 'w') as log_file:
    sys.stdout = log_file

    model.learn(
        total_timesteps=int(100000000),
        callback=[checkpoint_callback]
    )
    env.close()

# Restore stdout
sys.stdout = original_stdout

# Save the final model
model.save(os.path.join(save_dir, "ppo_snake_final.zip"))

demo_env = make_env()()

with open(log_file_path, 'w') as log_file:
    sys.stdout = log_file

    for i in range(100):
        model.learn(
            total_timesteps=int(1000000),
            callback=[checkpoint_callback]
        )

        obs = demo_env.reset()
        demo_env.render()
        time.sleep(0.5)
        done = False
        while not done:
            action, _ = model.predict(obs)
            obs, _, done, _ = demo_env.step(action)
            demo_env.render()
            time.sleep(0.5)

if name == "main": main() 嗯嗯就是在train_mlp的基础上加上了渲染的部分完全没有报错信息但是画面完全不出来

shironghe commented 1 year ago

这个格式，我实在不好看,你可以保留格式再发我一份嘛

在 2023-05-31 14:41:25，"zjhcwjb" @.***> 写道：

可以的话，共享你的代码在 2023-05-31 14:34:13，"zjhcwjb" @.> 写道：你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染在 2023-05-31 13:33:44，"zjhcwjb" @.> 写道：训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 已经加上渲染画面的代码了运行起来gpu占用也很高应该是在训练但还是看不到画面啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

import os import sys import random import time

from stable_baselines3.common.monitor import Monitor from stable_baselines3.common.vec_env import SubprocVecEnv from stable_baselines3.common.callbacks import CheckpointCallback from sb3_contrib import MaskablePPO from sb3_contrib.common.wrappers import ActionMasker

from snake_game_custom_wrapper_mlp import SnakeEnv

NUM_ENV = 32 LOG_DIR = "logs" os.makedirs(LOG_DIR, exist_ok=True)

Linear scheduler

def linear_schedule(initial_value, final_value=0.0):

if isinstance(initial_value, str): initial_value = float(initial_value) final_value = float(final_value) assert (initial_value > 0.0)

def scheduler(progress): return final_value + progress * (initial_value - final_value)

return scheduler

def make_env(seed=0): def _init(): env = SnakeEnv(seed=seed) env = ActionMasker(env, SnakeEnv.get_action_mask) env = Monitor(env) env.seed(seed) return env return _init

def main():

Generate a list of random seeds for each environment.

seed_set = set() while len(seed_set) < NUM_ENV: seed_set.add(random.randint(0, 1e9))

Create the Snake environment.

env = SubprocVecEnv([make_env(seed=s) for s in seed_set])

lr_schedule = linear_schedule(2.5e-4, 2.5e-6) clip_range_schedule = linear_schedule(0.15, 0.025)

Instantiate a PPO agent

model = MaskablePPO( "MlpPolicy", env, device="cuda", verbose=1, n_steps=2048, batch_size=512, n_epochs=4, gamma=0.94, learning_rate=lr_schedule, clip_range=clip_range_schedule, tensorboard_log=LOG_DIR )

Set the save directory

save_dir = "trained_models_mlp" os.makedirs(save_dir, exist_ok=True)

checkpoint_interval = 15625 # checkpoint_interval * num_envs = total_steps_per_checkpoint checkpoint_callback = CheckpointCallback(save_freq=checkpoint_interval, save_path=save_dir, name_prefix="ppo_snake")

Writing the training logs from stdout to a file

original_stdout = sys.stdout log_file_path = os.path.join(save_dir, "training_log.txt") with open(log_file_path, 'w') as log_file: sys.stdout = log_file

model.learn(
    total_timesteps=int(100000000),
    callback=[checkpoint_callback]
)
env.close()

Restore stdout

sys.stdout = original_stdout

Save the final model

model.save(os.path.join(save_dir, "ppo_snake_final.zip"))

demo_env = make_env()()

with open(log_file_path, 'w') as log_file: sys.stdout = log_file

for i in range(100):
    model.learn(
        total_timesteps=int(1000000),
        callback=[checkpoint_callback]
    )

    obs = demo_env.reset()
    demo_env.render()
    time.sleep(0.5)
    done = False
    while not done:
        action, _ = model.predict(obs)
        obs, _, done, _ = demo_env.step(action)
        demo_env.render()
        time.sleep(0.5)

if name == "main": main() 嗯嗯就是在train_mlp的基础上加上了渲染的部分完全没有报错信息但是画面完全不出来

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

zjhcwjb commented 1 year ago

您好这样可以吗？刚刚接触强化学习可能问的问题都比较蠢实在打扰了从 Windows 版邮件发送发件人: shironghe发送时间: 2023年5月31日 14:44收件人: linyiLYi/snake-ai抄送: zjhcwjb; Comment主题: Re: [linyiLYi/snake-ai] 可以test，无法训练，报错 (Issue #2) 这个格式，我实在不好看,你可以保留格式再发我一份嘛在 2023-05-31 14:41:25，"zjhcwjb" ***@***.***> 写道：可以的话，共享你的代码在 2023-05-31 14:34:13，"zjhcwjb" @.> 写道：你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染在 2023-05-31 13:33:44，"zjhcwjb" @.> 写道：训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 已经加上渲染画面的代码了运行起来gpu占用也很高应该是在训练但还是看不到画面啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>import osimport sysimport randomimport timefrom stable_baselines3.common.monitor import Monitorfrom stable_baselines3.common.vec_env import SubprocVecEnvfrom stable_baselines3.common.callbacks import CheckpointCallbackfrom sb3_contrib import MaskablePPOfrom sb3_contrib.common.wrappers import ActionMaskerfrom snake_game_custom_wrapper_mlp import SnakeEnvNUM_ENV = 32LOG_DIR = "logs"os.makedirs(LOG_DIR, exist_ok=True)Linear schedulerdef linear_schedule(initial_value, final_value=0.0):if isinstance(initial_value, str):initial_value = float(initial_value)final_value = float(final_value)assert (initial_value > 0.0)def scheduler(progress):return final_value + progress * (initial_value - final_value)return schedulerdef make_env(seed=0):def _init():env = SnakeEnv(seed=seed)env = ActionMasker(env, SnakeEnv.get_action_mask)env = Monitor(env)env.seed(seed)return envreturn _initdef main():# Generate a list of random seeds for each environment.seed_set = set()while len(seed_set) < NUM_ENV:seed_set.add(random.randint(0, 1e9))# Create the Snake environment.env = SubprocVecEnv([make_env(seed=s) for s in seed_set])lr_schedule = linear_schedule(2.5e-4, 2.5e-6)clip_range_schedule = linear_schedule(0.15, 0.025)# # Instantiate a PPO agentmodel = MaskablePPO("MlpPolicy",env,device="cuda",verbose=1,n_steps=2048,batch_size=512,n_epochs=4,gamma=0.94,learning_rate=lr_schedule,clip_range=clip_range_schedule,tensorboard_log=LOG_DIR)# Set the save directorysave_dir = "trained_models_mlp"os.makedirs(save_dir, exist_ok=True)checkpoint_interval = 15625 # checkpoint_interval * num_envs = total_steps_per_checkpointcheckpoint_callback = CheckpointCallback(save_freq=checkpoint_interval, save_path=save_dir, name_prefix="ppo_snake")# Writing the training logs from stdout to a fileoriginal_stdout = sys.stdoutlog_file_path = os.path.join(save_dir, "training_log.txt")with open(log_file_path, 'w') as log_file:sys.stdout = log_filemodel.learn(total_timesteps=int(100000000),callback=[checkpoint_callback])env.close()# Restore stdoutsys.stdout = original_stdout# Save the final modelmodel.save(os.path.join(save_dir, "ppo_snake_final.zip"))demo_env = make_env()()with open(log_file_path, 'w') as log_file:sys.stdout = log_filefor i in range(100):model.learn(total_timesteps=int(1000000),callback=[checkpoint_callback])obs = demo_env.reset()demo_env.render()time.sleep(0.5)done = Falsewhile not done:action, _ = model.predict(obs)obs, _, done, _ = demo_env.step(action)demo_env.render()time.sleep(0.5)if name == "main":main()嗯嗯就是在train_mlp的基础上加上了渲染的部分完全没有报错信息但是画面完全不出来—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

shironghe commented 1 year ago

哥们抱歉，刚刚我是在邮箱上看的代码，邮箱忽视了代码的格式，在github上看是没问题的。刚刚我又仔细看了一下train_mlp的代码，发现模型的整个训练过程都是在MaskablePPO内部进行的，我未在其API中找到调用env.render的参数，我想是无法展示训练的画面的。但你可以想象其训练画面不过是重复n次贪吃蛇游戏，不断的eat食物获取奖励，不断的死亡获取惩罚。

在 2023-05-31 15:04:14，"zjhcwjb" @.***> 写道：

您好这样可以吗？刚刚接触强化学习可能问的问题都比较蠢实在打扰了从 Windows 版邮件发送发件人: shironghe发送时间: 2023年5月31日 14:44收件人: linyiLYi/snake-ai抄送: zjhcwjb; Comment主题: Re: [linyiLYi/snake-ai] 可以test，无法训练，报错 (Issue #2) 这个格式，我实在不好看,你可以保留格式再发我一份嘛在 2023-05-31 14:41:25，"zjhcwjb" ***@***.***> 写道：可以的话，共享你的代码在 2023-05-31 14:34:13，"zjhcwjb" @.> 写道：你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染在 2023-05-31 13:33:44，"zjhcwjb" @.> 写道：训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 已经加上渲染画面的代码了运行起来gpu占用也很高应该是在训练但还是看不到画面啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>import osimport sysimport randomimport timefrom stable_baselines3.common.monitor import Monitorfrom stable_baselines3.common.vec_env import SubprocVecEnvfrom stable_baselines3.common.callbacks import CheckpointCallbackfrom sb3_contrib import MaskablePPOfrom sb3_contrib.common.wrappers import ActionMaskerfrom snake_game_custom_wrapper_mlp import SnakeEnvNUM_ENV = 32LOG_DIR = "logs"os.makedirs(LOG_DIR, exist_ok=True)Linear schedulerdef linear_schedule(initial_value, final_value=0.0):if isinstance(initial_value, str):initial_value = float(initial_value)final_value = float(final_value)assert (initial_value > 0.0)def scheduler(progress):return final_value + progress * (initial_value - final_value)return schedulerdef make_env(seed=0):def _init():env = SnakeEnv(seed=seed)env = ActionMasker(env, SnakeEnv.get_action_mask)env = Monitor(env)env.seed(seed)return envreturn _initdef main():# Generate a list of random seeds for each environment.seed_set = set()while len(seed_set) < NUM_ENV:seed_set.add(random.randint(0, 1e9))# Create the Snake environment.env = SubprocVecEnv([make_env(seed=s) for s in seed_set])lr_schedule = linear_schedule(2.5e-4, 2.5e-6)clip_range_schedule = linear_schedule(0.15, 0.025)# # Instantiate a PPO agentmodel = MaskablePPO("MlpPolicy",env,device="cuda",verbose=1,n_steps=2048,batch_size=512,n_epochs=4,gamma=0.94,learning_rate=lr_schedule,clip_range=clip_range_schedule,tensorboard_log=LOG_DIR)# Set the save directorysave_dir = "trained_models_mlp"os.makedirs(save_dir, exist_ok=True)checkpoint_interval = 15625 # checkpoint_interval * num_envs = total_steps_per_checkpointcheckpoint_callback = CheckpointCallback(save_freq=checkpoint_interval, save_path=save_dir, name_prefix="ppo_snake")# Writing the training logs from stdout to a fileoriginal_stdout = sys.stdoutlog_file_path = os.path.join(save_dir, "training_log.txt")with open(log_file_path, 'w') as log_file:sys.stdout = log_filemodel.learn(total_timesteps=int(100000000),callback=[checkpoint_callback])env.close()# Restore stdoutsys.stdout = original_stdout# Save the final modelmodel.save(os.path.join(save_dir, "ppo_snake_final.zip"))demo_env = make_env()()with open(log_file_path, 'w') as log_file:sys.stdout = log_filefor i in range(100):model.learn(total_timesteps=int(1000000),callback=[checkpoint_callback])obs = demo_env.reset()demo_env.render()time.sleep(0.5)done = Falsewhile not done:action, _ = model.predict(obs)obs, _, done, _ = demo_env.step(action)demo_env.render()time.sleep(0.5)if name == "main":main()嗯嗯就是在train_mlp的基础上加上了渲染的部分完全没有报错信息但是画面完全不出来—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: ***@***.***>

zjhcwjb commented 1 year ago

嗯嗯刚接触强化学习但是还是想显示出来确认是不是真的在训练 train_cnn中是有调用env.render的不知道为什么也显示不出来问chatgpt也不知道怎么改从 Windows 版邮件发送发件人: shironghe发送时间: 2023年5月31日 15:13收件人: linyiLYi/snake-ai抄送: zjhcwjb; Comment主题: Re: [linyiLYi/snake-ai] 可以test，无法训练，报错 (Issue #2) 哥们抱歉，刚刚我是在邮箱上看的代码，邮箱忽视了代码的格式，在github上看是没问题的。刚刚我又仔细看了一下train_mlp的代码，发现模型的整个训练过程都是在MaskablePPO内部进行的，我未在其API中找到调用env.render的参数，我想是无法展示训练的画面的。但你可以想象其训练画面不过是重复n次贪吃蛇游戏，不断的eat食物获取奖励，不断的死亡获取惩罚。在 2023-05-31 15:04:14，"zjhcwjb" ***@***.***> 写道：您好这样可以吗？刚刚接触强化学习可能问的问题都比较蠢实在打扰了从 Windows 版邮件发送发件人: shironghe发送时间: 2023年5月31日 14:44收件人: linyiLYi/snake-ai抄送: zjhcwjb; Comment主题: Re: [linyiLYi/snake-ai] 可以test，无法训练，报错 (Issue #2) 这个格式，我实在不好看,你可以保留格式再发我一份嘛在 2023-05-31 14:41:25，"zjhcwjb" ***@***.***> 写道：可以的话，共享你的代码在 2023-05-31 14:34:13，"zjhcwjb" @.> 写道：你可以借鉴test_mlp的代码，借助env.render()函数对游戏画面进行渲染在 2023-05-31 13:33:44，"zjhcwjb" @.> 写道：训练后注意看logs下会生成新的文件夹，里面的文件可以用TensorBoard查看进行可视化在 2023-05-31 02:01:11，"zjhcwjb" @.> 写道： mlp模型训练时没有在命令行给提示，你可以在windows下ctrl+shift+Esc查看CPU、GPU的使用情况，在ubuntu下使用htop命令查看那CPU的使用情况，在watch -n 1 nvidia-smi下查看GPU的使用情况，程序占用率高就表明在训练了，并不是卡住了在 2023-05-26 13:09:31，"wave" @.> 写道：为什么我卡在这 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 请问怎么让训练过程可视化呢 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 感谢回复但是如果想看到ai每一局游戏的画面应该怎么做啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> 已经加上渲染画面的代码了运行起来gpu占用也很高应该是在训练但还是看不到画面啊 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>import osimport sysimport randomimport timefrom stable_baselines3.common.monitor import Monitorfrom stable_baselines3.common.vec_env import SubprocVecEnvfrom stable_baselines3.common.callbacks import CheckpointCallbackfrom sb3_contrib import MaskablePPOfrom sb3_contrib.common.wrappers import ActionMaskerfrom snake_game_custom_wrapper_mlp import SnakeEnvNUM_ENV = 32LOG_DIR = "logs"os.makedirs(LOG_DIR, exist_ok=True)Linear schedulerdef linear_schedule(initial_value, final_value=0.0):if isinstance(initial_value, str):initial_value = float(initial_value)final_value = float(final_value)assert (initial_value > 0.0)def scheduler(progress):return final_value + progress * (initial_value - final_value)return schedulerdef make_env(seed=0):def _init():env = SnakeEnv(seed=seed)env = ActionMasker(env, SnakeEnv.get_action_mask)env = Monitor(env)env.seed(seed)return envreturn _initdef main():# Generate a list of random seeds for each environment.seed_set = set()while len(seed_set) < NUM_ENV:seed_set.add(random.randint(0, 1e9))# Create the Snake environment.env = SubprocVecEnv([make_env(seed=s) for s in seed_set])lr_schedule = linear_schedule(2.5e-4, 2.5e-6)clip_range_schedule = linear_schedule(0.15, 0.025)# # Instantiate a PPO agentmodel = MaskablePPO("MlpPolicy",env,device="cuda",verbose=1,n_steps=2048,batch_size=512,n_epochs=4,gamma=0.94,learning_rate=lr_schedule,clip_range=clip_range_schedule,tensorboard_log=LOG_DIR)# Set the save directorysave_dir = "trained_models_mlp"os.makedirs(save_dir, exist_ok=True)checkpoint_interval = 15625 # checkpoint_interval * num_envs = total_steps_per_checkpointcheckpoint_callback = CheckpointCallback(save_freq=checkpoint_interval, save_path=save_dir, name_prefix="ppo_snake")# Writing the training logs from stdout to a fileoriginal_stdout = sys.stdoutlog_file_path = os.path.join(save_dir, "training_log.txt")with open(log_file_path, 'w') as log_file:sys.stdout = log_filemodel.learn(total_timesteps=int(100000000),callback=[checkpoint_callback])env.close()# Restore stdoutsys.stdout = original_stdout# Save the final modelmodel.save(os.path.join(save_dir, "ppo_snake_final.zip"))demo_env = make_env()()with open(log_file_path, 'w') as log_file:sys.stdout = log_filefor i in range(100):model.learn(total_timesteps=int(1000000),callback=[checkpoint_callback])obs = demo_env.reset()demo_env.render()time.sleep(0.5)done = Falsewhile not done:action, _ = model.predict(obs)obs, _, done, _ = demo_env.step(action)demo_env.render()time.sleep(0.5)if name == "main":main()嗯嗯就是在train_mlp的基础上加上了渲染的部分完全没有报错信息但是画面完全不出来—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***> —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

linyiLYi / snake-ai

可以test，无法训练，报错 #2

Linear scheduler

Generate a list of random seeds for each environment.

Create the Snake environment.

Instantiate a PPO agent

Set the save directory

Writing the training logs from stdout to a file

Restore stdout

Save the final model