ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.14k stars 5.8k forks source link

[rllib] Training becomes extremely slow; multiple iterations but only single episode #15607

Closed kiranikram closed 1 year ago

kiranikram commented 3 years ago

[rllib]

Up until this morning I had no issues running PPO on my MultiAgent custom env. However after a certain number of episodes, training gets 'stuck' - ie even though it still appears to be training, and iterations are increasing, no new episodes are carried out. I have tried a variety of stopping criterion, including the original, which worked just fine.

ray 1.2 , python 3.7.6

config = { "env": RLlibWrapper, "env_config": {'jungle': 'EasyExit', "size": 11}, "no_done_at_end": False, "gamma": 0.9,

Use GPUs iff RLLIB_NUM_GPUS env var set to > 0.

    "num_gpus": int(os.environ.get("RLLIB_NUM_GPUS", "0")),
    "num_workers": 0,
    "train_batch_size": 200,
    "multiagent": {

        "policies": {
            "centralized_ppo": (None, obs_space, act_space, {})
        },
        "policy_mapping_fn": policy_mapping_fn,
    },
    "model": {

        'fcnet_hiddens': [256, 256],
        # 'vf_share_layers': True,
        'use_lstm': True,
        "lstm_cell_size": 256,
        "lstm_use_prev_action": True,
        "lstm_use_prev_reward": True,
        # "use_attention": True,

    },
    "framework": args.framework,
}

stop = {
    "episode_reward_mean": args.stop_reward,
    "timesteps_total": args.stop_timesteps,
    "training_iteration": args.stop_iters,
}

results = tune.run("PPO", stop=stop, config=config, verbose=1)

if args.as_test:
    check_learning_achieved(results, args.stop_reward)
ray.shutdown()

"needs-repro-script".

I have verified my script runs in a clean environment and reproduces the issue. I have verified the issue also occurs with the latest wheels.

zzchuman commented 3 years ago

Hello, kiranikram! I find you have use the "use_lstm" and "fcnet_hiddens" at the same time. I guess that the agent nerual network is two fc network, then lstm, right?