PPO Complex Obs/Action Space

vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

http://docs.cleanrl.dev

Other

5.61k stars 637 forks source link

PPO Complex Obs/Action Space #353

Open ttumiel opened 1 year ago

ttumiel commented 1 year ago

Problem Description

Would it be useful to add a complex (nested/dictionary) action and obs space variant of the PPO algo? I did this for minerl and wondered if it would be useful to contribute into the main library? I'd happily make a PR.

Checklist

[x] I have checked that there is no similar issue in the repo.
[x] I have checked the documentation site and found not relevant information in GitHub issues.

Current Behavior

Currently PPO only supports continuous or discrete actions separately and a single array observation.

Expected Behavior

PPO can support arbitrary complex action and observation spaces.

Possible Solution

Use tree to map over actions and observation.
Store arrays in the same struct shape as the obs space or flatten them for storage and unflatten when passing to the network.

vwxyzjn commented 1 year ago

Yes that would be great. I suggest implementing it based on #338. #338 uses EnvPool's async API, which is equivalent to the regular vec env when async_batch_size = num_envs.

I was thinking about this issue more and think that you should have two types of observations:

Vector obs
Image obs

And for these two obs types we need to pair it with corresponding networks.

Feel free to make a PR :) Thanks.

Cc @edbeeching, this PR could help deal with Godot rl environments.

vwxyzjn commented 1 year ago

Hi @ttumiel just following up with this. Are you still interested in the issue?

ttumiel commented 1 year ago

Yes! Sorry about the delay, I'll post a PR soon :)

On Mon, 23 Jan 2023, 18:45 Costa Huang, @.***> wrote:

Hi @ttumiel https://github.com/ttumiel just following up with this. Are you still interested in the issue?

— Reply to this email directly, view it on GitHub https://github.com/vwxyzjn/cleanrl/issues/353#issuecomment-1400655430, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE4GS4QRNKXLBDWFQRBONLWT2YSFANCNFSM6AAAAAAT7DUPAA . You are receiving this because you were mentioned.Message ID: @.***>