Open vakker opened 1 year ago
Hello,
I tried to run the atari breakout benchmark available in the ray/rllib/tuned_examples folder.
OS: Ubuntu 20.04.4 LTS RAY: 1.13 PYTHON: 3.9.10
atari-ppo:
env: BreakoutNoFrameskip-v4
run: PPO
config:
# Works for both torch and tf.
framework: tf
lambda: 0.95
kl_coeff: 0.5
clip_rewards: True
clip_param: 0.1
vf_clip_param: 10.0
entropy_coeff: 0.01
train_batch_size: 5000
rollout_fragment_length: 100
sgd_minibatch_size: 500
num_sgd_iter: 10
num_workers: 10
num_envs_per_worker: 5
batch_mode: truncate_episodes
observation_filter: NoFilter
model:
vf_share_layers: true
num_gpus: 1
I ran a training session using the command rllib train -f "settings.yaml"
and I compared the obtained reward with the reference results shown here. Unfortunately, my mean reward got stuck at values around 2.
Is still reasonable to use framework: tf
in the configuration settings? I noticed that with framework: tf2
, the reward improved as expected. A picture of the reward in both cases is shown below.
@gjoliver What do you think about this issue?
Is there any update on this issue? Is it reproducible or is this something particular to my setup?
Hi, I'm a bot from the Ray team :)
To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.
If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel.
Commenting so the issue doesn't get closed.
I'm having a similar experience with my custom env.
Training with ray 2.1.0
(green), after switching to 2.3.1
(brown), and after switching back to 2.1.0
(red).
Everything else is pretty much the same, except for minor adjustments to get gymnasium running in 2.3.1
Framework: torch
I encountered similar problems while using PPO to train Google Football. Specifically, I noticed that the CPU usage was fluctuating between 0% and 100%, while the GPU usage was fluctuating between 0% and 30%. I suspect that there may be some internal context switching within Ray that is impeding performance.
Is this fixed? I too am getting bad performance on ray 2.3.1 when running the tuned config files. Rewards for SpaceInvaders don't even cross 15 after 5M steps.
What happened + What you expected to happen
I'm running some benchmarks, and I'm getting varying results on Atari. I'm using the Breakout benchmark from the
learning_tests
folder and the Pong example from thetuned_examples
folder.I tried the for
framework
all:tf
,tf2
andtorch
. I expected that they would all pass, or run according to theOn a single GPU, this achieves maximum reward in ~15-20 minutes.
comment. They didn't.You can see the Tensorboard logs here: https://tensorboard.dev/experiment/8JLpS12qQcKtNc9Lm6BZWw/
Versions / Dependencies
Reproduction script
Pong:
Breakout (note: for
TF2
this will fail, it needsnum_gpus: 1
):Run:
Issue Severity
High: It blocks me from completing my task.