I ran the TensorFlow PPO algorithm on a problem that gave me some instable gradients. I wanted to apply gradient clipping and used grad_clip, but nothing changed. I investigated the code a little and it appears to me that the grad_clip might be never used. I know that gradient clipping was in the code before - maybe it is now somewhere else (I also checked the compute_gradients() function in the tf_mixins.py, but this appears also to be never used.
I have not the best knowledge of the actual code base, so I might have not seen it. This holds true for all TensorFlow implementations.
What did you expect to happen
I expected the grad_clip parameter in the configuration of the algorithm to be effective and my gradients to be clipped.
If gradient clipping is indeed not momentarily implemented, I also prepard a PR that should fix it.
Versions / Dependencies
Fedora 37
Python 3.9.4
Ray 2.2.0
Reproduction script
import ray
import time
from ray import air, tune
from ray.rllib.algorithms.ppo.ppo import PPOConfig
#ray.init(local_mode=True)
config = (
PPOConfig()
.environment(
env="CartPole-v1",
)
.framework(
framework=tune.grid_search(["tf", "tf2"]),
eager_tracing=tune.grid_search([False, True]),
)
.rollouts(
num_rollout_workers=1,
observation_filter="MeanStdFilter",
)
.training(
gamma=.99,
lr=0.0003,
num_sgd_iter=6,
vf_loss_coeff=0.01,
# This should give exactly the same results.
grad_clip=tune.grid_search([None, 0.5]),
model={
"fcnet_hiddens": [32],
"fcnet_activation": "linear",
"vf_share_layers":True,
}
)
.debugging(
seed=42,
)
)
stop = {
"training_iteration": 100,
}
tuner = tune.Tuner(
"PPO",
param_space=config.to_dict(),
run_config=air.RunConfig(
stop=stop,
verbose=1,
local_dir="~/ray_results/TestGradientClipping",
checkpoint_config=air.CheckpointConfig(
checkpoint_frequency=0,
checkpoint_at_end=False,
)
)
)
start = time.time()
tuner.fit()
end = time.time()
print(f"Execution time: {end - start}")
Issue Severity
Medium: It is a significant difficulty but I can work around it.
What happened + What you expected to happen
What happened
I ran the TensorFlow PPO algorithm on a problem that gave me some instable gradients. I wanted to apply gradient clipping and used
grad_clip
, but nothing changed. I investigated the code a little and it appears to me that thegrad_clip
might be never used. I know that gradient clipping was in the code before - maybe it is now somewhere else (I also checked thecompute_gradients()
function in thetf_mixins.py
, but this appears also to be never used.I have not the best knowledge of the actual code base, so I might have not seen it. This holds true for all TensorFlow implementations.
What did you expect to happen
I expected the
grad_clip
parameter in the configuration of the algorithm to be effective and my gradients to be clipped.If gradient clipping is indeed not momentarily implemented, I also prepard a PR that should fix it.
Versions / Dependencies
Fedora 37 Python 3.9.4 Ray 2.2.0
Reproduction script
Issue Severity
Medium: It is a significant difficulty but I can work around it.