ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.23k stars 5.81k forks source link

[RLlib] `grad_clip` parameter appears ineffective for TensorFlow implementations with multiple optimizers. #32626

Open simonsays1980 opened 1 year ago

simonsays1980 commented 1 year ago

What happened + What you expected to happen

What happened

I ran the TensorFlow PPO algorithm on a problem that gave me some instable gradients. I wanted to apply gradient clipping and used grad_clip, but nothing changed. I investigated the code a little and it appears to me that the grad_clip might be never used. I know that gradient clipping was in the code before - maybe it is now somewhere else (I also checked the compute_gradients() function in the tf_mixins.py, but this appears also to be never used.

I have not the best knowledge of the actual code base, so I might have not seen it. This holds true for all TensorFlow implementations.

What did you expect to happen

I expected the grad_clip parameter in the configuration of the algorithm to be effective and my gradients to be clipped.

If gradient clipping is indeed not momentarily implemented, I also prepard a PR that should fix it.

Versions / Dependencies

Fedora 37 Python 3.9.4 Ray 2.2.0

Reproduction script

import ray
import time

from ray import air, tune
from ray.rllib.algorithms.ppo.ppo import PPOConfig

#ray.init(local_mode=True)
config = (
    PPOConfig()
    .environment(
        env="CartPole-v1",

    )
    .framework(
        framework=tune.grid_search(["tf", "tf2"]),
        eager_tracing=tune.grid_search([False, True]),       
    )
    .rollouts(
        num_rollout_workers=1,  
        observation_filter="MeanStdFilter",              
    )
    .training(
        gamma=.99,
        lr=0.0003,
        num_sgd_iter=6,
        vf_loss_coeff=0.01,
       # This should give exactly the same results.
        grad_clip=tune.grid_search([None, 0.5]),
        model={
            "fcnet_hiddens": [32],
            "fcnet_activation": "linear",
            "vf_share_layers":True,
        }
    )
    .debugging(
        seed=42,
    )   
)

stop = {       
        "training_iteration": 100,
    }
tuner = tune.Tuner(
    "PPO",
    param_space=config.to_dict(),
    run_config=air.RunConfig(
            stop=stop,
            verbose=1,
            local_dir="~/ray_results/TestGradientClipping",
            checkpoint_config=air.CheckpointConfig(
                checkpoint_frequency=0,
                checkpoint_at_end=False,
            )
        )
)

start = time.time()
tuner.fit()
end = time.time()
print(f"Execution time: {end - start}")

Issue Severity

Medium: It is a significant difficulty but I can work around it.

jesuspc commented 1 year ago

Is this still the case for APPO/IMPALA?