hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.17k stars 725 forks source link

ACKTR hangs/crashes #196

Open EliasHasle opened 5 years ago

EliasHasle commented 5 years ago

Describe the bug

ACKTR works with 5000 steps, but with 50000 steps it does not finish within reasonable time. Windows Task manager indicates that the main process is active on CPU, whereas the child processes are passive.

I am not sure, but I think that in one run the log showed FPS declining over the updates, and total_timesteps remaining zero.

In another experiment with a custom environment (retro SMB with a wrapper), ACKTR, even with only one worker process, goes out of memory on a machine with 32 GiB RAM. This is even with the simple default CNN. The cause may indeed be elsewhere, although the same environment works fine with PPO2.

Code example

python -m stable_baselines.acktr.run_atari --env=SpaceInvadersNoFrameskip-v4 --num-timesteps=50000

Also:

if __name__=="__main__":
    import os
    from stable_baselines import ACKTR
    from stable_baselines.common.cmd_util import make_atari_env

    vecenv = make_atari_env("SpaceInvadersNoFrameskip-v4", num_env=32, seed=0)
    vecenv = VecNormalize(vecenv)

    model = ACKTR(policy="CnnPolicy",env=vecenv, verbose=1)
    model.learn(50000)

System Info Describe the characteristic of your environment:

araffin commented 5 years ago

Hello, I'm afraid the problem comes from the algorithm itself. ACKTR uses KFAC as an optimizer and apparently the current implementation is memory hungry, so it works fine with vector observation but is more tricky for image as input.

EliasHasle commented 5 years ago

Memory was no issue with Space Invaders, only with Mario. Probably related to image resolution? Because if they are both handled by the same conv, the encoding (or flattened conv output) will perhaps have different size, right?

With Space Invaders the issue is that it seems to stop doing anything useful. I should give you a better report, but have you tested my example code? You see, I think the same thing worked well in OpenAI baselines. (Which I should also check again when I have the time.)

araffin commented 5 years ago

I managed to train ACKTR for atari games in the rl zoo. Hyperparameters can be found here

I had the same problem of learning getting slower and slower, but 32 processes is a lot! (especially when using images) I don't have the time to investigate further where is the bottleneck in the code, but would be interested if you manage to fix this.

EliasHasle commented 5 years ago

I found it strange that the task manager showed next to no activity (1-3 %) on the subprocesses, at least after some time, while the main process was taking up to 100%. I guess that at least locates the bottleneck (or eternal loop) to somewhere in the centrally run code.

It is unlikely that I will find/allocate the time to gain the knowledge required to contribute to the implementation of this algorithm in the foreseeable future. But if it becomes pressing for me because of a concrete application, it may look different.

I may do some more simple experiments to contribute to the investigation, though. Maybe someone else has the same problem and is more interested in a solution than I am.

ChengYen-Tang commented 4 years ago

If async_eigen_decomp=True, the training speed will return to normal. But it will cause the neural network to collapse, and NaN will appear in the neural network. Any suggestions? Openai baselines kfac.py