opendilab / PPOxFamily

PPO x Family DRL Tutorial Course(决策智能入门级公开课:8节课帮你盘清算法理论,理顺代码逻辑,玩转决策AI应用实践 )
https://opendilab.github.io/PPOxFamily/
Apache License 2.0
1.83k stars 166 forks source link

Chapter2 Application Demo #4

Open PaParaZz1 opened 1 year ago

PaParaZz1 commented 1 year ago

在本 issue 中,我们会更新所有和课程第二讲相关的应用 demo 素材

训练代码链接

EasonQYS commented 1 year ago

期待代码

jianzuo commented 1 year ago

请问有关于multiDiscrete动作空间的详细对照解析吗,我查看了代码注视文档教程好像只有普通离散动作的。 谢谢!

PaParaZz1 commented 1 year ago

请问有关于multiDiscrete动作空间的详细对照解析吗,我查看了代码注视文档教程好像只有普通离散动作的。 谢谢!

其实就是 DI-engine 中的 MultiHead 功能实现,可以先看这边的源码,我们本周内会在课程 repo 这边更新下代码注解文档。

jianzuo commented 1 year ago

明白了,谢谢!

jianzuo commented 1 year ago

您好, 请问您回复说的更新关于multihead的代码注释是在哪可以看到?我最近在尝试用PPO实现输出多维动作。 一直没有弄清楚。谢谢!

jianzuo commented 1 year ago

我跟据讲解尝试了下multihead,但是报错了:

import torch
import torch.nn as nn
import torch.nn.functional as F
class DiscretePolicyNetMultiHead(nn.Module):
    def __init__(self, obs_dim, hidden_dim, action_dim) -> None:
        super(DiscretePolicyNet, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(obs_dim, hidden_dim),
            nn.ReLU(),
        )
        self.heads = nn.ModuleList([nn.Linear(hidden_dim, dim) for dim in action_dim])

    def forward(self, x: torch.Tensor)->torch.Tensor:
        x = self.encoder(x)
        logit = [self.head(x) for head in self.heads]
        return logits

def sample_act(logit: torch.Tensor) -> torch.Tensor:
    probs = torch.softmax(logit, dim=-1)
    dists = [torch.distributions.Categorical(probs=prob) for prob in probs]
    return [dist.sample() for dist in dists]

def test_action_multihead():
    B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3]
    state = torch.rand(B, obs_shape)
    policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape)
    logit = policy_net(state)
    assert logit.shape == (B, action_shape)
    action = sample_act(logit)
    assert action.shape == (B,)
    return action

test_action_multihead()
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_27/530012604.py in <module>
----> 1 test_action_multihead()

/tmp/ipykernel_27/2493506364.py in test_action_multihead()
      2     B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3]
      3     state = torch.rand(B, obs_shape)
----> 4     policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape)
      5     logit = policy_net(state)
      6     assert logit.shape == (B, action_shape)

/tmp/ipykernel_27/2688308212.py in __init__(self, obs_dim, hidden_dim, action_dim)
      6             nn.ReLU(),
      7         )
----> 8         self.head = nn.Linear(hidden_dim, action_dim)
      9 
     10     def forward(self, x: torch.Tensor)->torch.Tensor:

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/linear.py in __init__(self, in_features, out_features, bias, device, dtype)
     94         self.in_features = in_features
     95         self.out_features = out_features
---> 96         self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
     97         if bias:
     98             self.bias = Parameter(torch.empty(out_features, **factory_kwargs))

TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
 * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (tuple of SymInts size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
PaParaZz1 commented 1 year ago

我跟据讲解尝试了下multihead,但是报错了:

import torch
import torch.nn as nn
import torch.nn.functional as F
class DiscretePolicyNetMultiHead(nn.Module):
    def __init__(self, obs_dim, hidden_dim, action_dim) -> None:
        super(DiscretePolicyNet, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(obs_dim, hidden_dim),
            nn.ReLU(),
        )
        self.heads = nn.ModuleList([nn.Linear(hidden_dim, dim) for dim in action_dim])

    def forward(self, x: torch.Tensor)->torch.Tensor:
        x = self.encoder(x)
        logit = [self.head(x) for head in self.heads]
        return logits

def sample_act(logit: torch.Tensor) -> torch.Tensor:
    probs = torch.softmax(logit, dim=-1)
    dists = [torch.distributions.Categorical(probs=prob) for prob in probs]
    return [dist.sample() for dist in dists]

def test_action_multihead():
    B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3]
    state = torch.rand(B, obs_shape)
    policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape)
    logit = policy_net(state)
    assert logit.shape == (B, action_shape)
    action = sample_act(logit)
    assert action.shape == (B,)
    return action

test_action_multihead()
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_27/530012604.py in <module>
----> 1 test_action_multihead()

/tmp/ipykernel_27/2493506364.py in test_action_multihead()
      2     B, obs_shape, hidden_shape, action_shape = 4, 10, 32, [6, 3]
      3     state = torch.rand(B, obs_shape)
----> 4     policy_net = DiscretePolicyNet(obs_shape, hidden_shape, action_shape)
      5     logit = policy_net(state)
      6     assert logit.shape == (B, action_shape)

/tmp/ipykernel_27/2688308212.py in __init__(self, obs_dim, hidden_dim, action_dim)
      6             nn.ReLU(),
      7         )
----> 8         self.head = nn.Linear(hidden_dim, action_dim)
      9 
     10     def forward(self, x: torch.Tensor)->torch.Tensor:

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/linear.py in __init__(self, in_features, out_features, bias, device, dtype)
     94         self.in_features = in_features
     95         self.out_features = out_features
---> 96         self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
     97         if bias:
     98             self.bias = Parameter(torch.empty(out_features, **factory_kwargs))

TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
 * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (tuple of SymInts size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

现在可以参考这个例子 https://github.com/opendilab/PPOxFamily/blob/main/chapter2_action/discrete_tutorial_zh.py#L58

jianzuo commented 1 year ago

谢谢!我根据您的例子重写下。

lz-8713 commented 11 months ago

multiDiscrete动作空间和Discrete动作空间相关的ppo的代码,还有控制交通信号灯的完整代码能分享一下吗?

zhixiongzh commented 10 months ago

你好,我docker pull了最新的opendilab/ding:nightly-mujoco镜像,然后在里面运行pip install git+https://github.com/zjowowen/gym-pybullet-drones@master,想跑一下drones的例子,但是报错

root@BF4-C-008T7:/workspaces/PPOxFamily# pip install git+https://github.com/zjowowen/gym-pybullet-drones@master
Collecting git+https://github.com/zjowowen/gym-pybullet-drones@master
  Cloning https://github.com/zjowowen/gym-pybullet-drones (to revision master) to /tmp/pip-req-build-wy0jagd4
  Running command git clone --filter=blob:none --quiet https://github.com/zjowowen/gym-pybullet-drones /tmp/pip-req-build-wy0jagd4
  Resolved https://github.com/zjowowen/gym-pybullet-drones to commit b35eed32c251cc69c2d7b0de74dd9a66ca1357b1
  Installing build dependencies ... error
  error: subprocess-exited-with-error

  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [20 lines of output]
      Collecting poetry-core@ git+https://github.com/python-poetry/poetry-core.git@master
        Cloning https://github.com/python-poetry/poetry-core.git (to revision master) to /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8
        Running command git clone --filter=blob:none --quiet https://github.com/python-poetry/poetry-core.git /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8
        WARNING: Did not find branch or tag 'master', assuming revision or ref.
        Running command git checkout -q master
        error: pathspec 'master' did not match any file(s) known to git.
        error: subprocess-exited-with-error

        × git checkout -q master did not run successfully.
        │ exit code: 1
        ╰─> See above for output.

        note: This error originates from a subprocess, and is likely not a problem with pip.
      error: subprocess-exited-with-error

      × git checkout -q master did not run successfully.
      │ exit code: 1
      ╰─> See above for output.

      note: This error originates from a subprocess, and is likely not a problem with pip.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

我手动安装了poetry-core也不行,感觉是那个master的branch名字要改成main? @PaParaZz1 请问有什么建议吗?

zhixiongzh commented 10 months ago

你好,我docker pull了最新的opendilab/ding:nightly-mujoco镜像,然后在里面运行pip install git+https://github.com/zjowowen/gym-pybullet-drones@master,想跑一下drones的例子,但是报错

root@BF4-C-008T7:/workspaces/PPOxFamily# pip install git+https://github.com/zjowowen/gym-pybullet-drones@master
Collecting git+https://github.com/zjowowen/gym-pybullet-drones@master
  Cloning https://github.com/zjowowen/gym-pybullet-drones (to revision master) to /tmp/pip-req-build-wy0jagd4
  Running command git clone --filter=blob:none --quiet https://github.com/zjowowen/gym-pybullet-drones /tmp/pip-req-build-wy0jagd4
  Resolved https://github.com/zjowowen/gym-pybullet-drones to commit b35eed32c251cc69c2d7b0de74dd9a66ca1357b1
  Installing build dependencies ... error
  error: subprocess-exited-with-error

  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [20 lines of output]
      Collecting poetry-core@ git+https://github.com/python-poetry/poetry-core.git@master
        Cloning https://github.com/python-poetry/poetry-core.git (to revision master) to /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8
        Running command git clone --filter=blob:none --quiet https://github.com/python-poetry/poetry-core.git /tmp/pip-install-s945w_8c/poetry-core_d952979d432a40669870b5448a5371f8
        WARNING: Did not find branch or tag 'master', assuming revision or ref.
        Running command git checkout -q master
        error: pathspec 'master' did not match any file(s) known to git.
        error: subprocess-exited-with-error

        × git checkout -q master did not run successfully.
        │ exit code: 1
        ╰─> See above for output.

        note: This error originates from a subprocess, and is likely not a problem with pip.
      error: subprocess-exited-with-error

      × git checkout -q master did not run successfully.
      │ exit code: 1
      ╰─> See above for output.

      note: This error originates from a subprocess, and is likely not a problem with pip.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

我手动安装了poetry-core也不行,感觉是那个master的branch名字要改成main? @PaParaZz1 请问有什么建议吗?

解决了,需要把整个drones的库clone下来,git clone https://github.com/zjowowen/gym-pybullet-drones.git 然后把这行代码requires = ["poetry-core @ git+https://github.com/python-poetry/poetry-core.git@master"]里面的master改成main,然后在那个库里手动pip install -e .就可以安装了

zjowowen commented 10 months ago

Hi,

This repo [https://github.com/zjowowen/gym-pybullet-drones.git] is updated with the origin repo [https://github.com/utiasDSL/gym-pybullet-drones].

Thanks for reminding us!

zhixiongzh commented 10 months ago

@zjowowen 跑通代码后我还是无法复现这个drones_fly_demo, 按照默认参数训练了5e6 steps之后return并没有很好看,然后我加载了最佳的保存模型,record了video之后发现它是从门上面飞过去的而不是从下面传过去的。请问为了达到你们展示的demo的效果还有别的设置吗? return

rokey0001 commented 8 months ago

您好,我在跑demo时老遇到这样的问题,不知道有没有小伙伴和我有一样的问题。 Traceback (most recent call last): File "", line 1, in File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) File "E:\download\anaconda\envs\DILAB\lib\site-packages\ding\utils\compression_helper.py", line 24, in setstate self.data = cloudpickle.loads(data) TypeError: _generator_ctor() takes from 0 to 1 positional arguments but 2 were given

[10-20 22:34:24] WARNING subprocess reset set seed failed, ignore and continue... subprocess_env_manager.py:263 subprocess exception traceback:
Traceback (most recent call last):
File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\connection.py", line 312, in _recv_bytes
nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] 管道已结束。

Traceback (most recent call last):
File "E:\download\anaconda\envs\DILAB\lib\site-packages\ding\envs\env_manager\subprocess_env_manager.py", line
259, in reset
ret = self._pipe_parents.recv()
File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\connection.py", line 250, in recv
buf = self._recv_bytes()
File "E:\download\anaconda\envs\DILAB\lib\multiprocessing\connection.py", line 321, in _recv_bytes
raise EOFError
EOFError

wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing. [10-20 22:34:26] ERROR Env 2 reset has exceeded max retries(5) subprocess_env_manager.py:317 [10-20 22:34:26] ERROR Env 1 reset has exceeded max retries(5) subprocess_env_manager.py:317 [10-20 22:34:26] ERROR Env 3 reset has exceeded max retries(5) subprocess_env_manager.py:317 wandb: View run dutiful-pond-1 at: https://wandb.ai/anony-mouse-788424711663011732/bipedalwalker_demo/runs/uomu1uw0?apiKey=dc8282c6be97b578e2fa87aac8b882089ab2adaf wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: .\wandb\run-20231020_223406-uomu1uw0\logs

huang312 commented 6 months ago

请问仓库中有multidiscretePPO的完整代码和训练过程吗

Billchan9711 commented 5 months ago

求multidiscrete+PPO 控制交通灯代码

PaParaZz1 commented 5 months ago

请问仓库中有multidiscretePPO的完整代码和训练过程吗

可以参考 DI-smartcross 中的相关例子,由于 cityflow 环境比较复杂,我们没有直接整合到课程仓库中,所以请移步 DI-smartcross 查看。传送门

JBGZ-XXB commented 3 months ago

你好,无人机姿态控制(连续动作空间)这个案例的环境代码有么,想参考一下如何用强化学习在接上pid控制器的

PaParaZz1 commented 3 months ago

你好,无人机姿态控制(连续动作空间)这个案例的环境代码有么,想参考一下如何用强化学习在接上pid控制器的

所有代码都在本仓库的代码示例中可以找到的,无人机姿态控制的代码是这个链接