vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
http://docs.cleanrl.dev
Other
5.02k stars 575 forks source link

PPO Complex Obs/Action Space #353

Open ttumiel opened 1 year ago

ttumiel commented 1 year ago

Problem Description

Would it be useful to add a complex (nested/dictionary) action and obs space variant of the PPO algo? I did this for minerl and wondered if it would be useful to contribute into the main library? I'd happily make a PR.

Checklist

Current Behavior

Currently PPO only supports continuous or discrete actions separately and a single array observation.

Expected Behavior

PPO can support arbitrary complex action and observation spaces.

Possible Solution

vwxyzjn commented 1 year ago

Yes that would be great. I suggest implementing it based on #338. #338 uses EnvPool's async API, which is equivalent to the regular vec env when async_batch_size = num_envs.

I was thinking about this issue more and think that you should have two types of observations:

  1. Vector obs
  2. Image obs

And for these two obs types we need to pair it with corresponding networks.

Feel free to make a PR :) Thanks.

Cc @edbeeching, this PR could help deal with Godot rl environments.

vwxyzjn commented 1 year ago

Hi @ttumiel just following up with this. Are you still interested in the issue?

ttumiel commented 1 year ago

Yes! Sorry about the delay, I'll post a PR soon :)

On Mon, 23 Jan 2023, 18:45 Costa Huang, @.***> wrote:

Hi @ttumiel https://github.com/ttumiel just following up with this. Are you still interested in the issue?

— Reply to this email directly, view it on GitHub https://github.com/vwxyzjn/cleanrl/issues/353#issuecomment-1400655430, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE4GS4QRNKXLBDWFQRBONLWT2YSFANCNFSM6AAAAAAT7DUPAA . You are receiving this because you were mentioned.Message ID: @.***>