hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.16k stars 725 forks source link

ACKTR hangs on atari and works very slow on custom env #1055

Open mily20001 opened 3 years ago

mily20001 commented 3 years ago

Describe the bug Basically I wanted to check how ACKTR would perform on my custom env, however it performs first 10 updates quite fast, and then each iteration is taking very long on my env (with async_eigen_decomp=True it takes even longer) and hangs on atari. During first 10 updates it uses all cores of my cpu and significant part of my gpu, while after that it uses only one core in 100% and nothing else. What's interesting is that same thing happens when i use env created with make_atari, while for env created with make_atari_env it seems to perform better (still slow and only one core after 10th update, but doesn't completely hang as make_atari env).

Code example

from datetime import datetime

from stable_baselines.common.atari_wrappers import make_atari
from stable_baselines.common.callbacks import BaseCallback
from stable_baselines.common.cmd_util import make_atari_env
from stable_baselines.common.policies import CnnPolicy
from stable_baselines import ACKTR

class OnUpdate(BaseCallback):
    def __init__(self):
        super().__init__()
        self.update_num = 0
        self.last_update_timestamp = datetime.now()

    def _on_rollout_end(self) -> None:
        self.update_num += 1
        diff = datetime.now() - self.last_update_timestamp
        diff = f', {int(diff.total_seconds() * 1000)}ms since prev update'
        print(f'starting update {self.update_num}{ diff if self.update_num > 1 else "" }')
        self.last_update_timestamp = datetime.now()

callback = OnUpdate()

# this one perform slow
env = make_atari_env('BreakoutNoFrameskip-v4', num_env=1, seed=0)
# but this one hangs completely
# env = make_atari('BreakoutNoFrameskip-v4')

model = ACKTR(CnnPolicy, env, verbose=1)
model.learn(total_timesteps=50000, callback=callback)

For make_atari_env example output from callback looks like this:

starting update 1
starting update 2, 758ms since prev update
starting update 3, 74ms since prev update
starting update 4, 73ms since prev update
starting update 5, 77ms since prev update
starting update 6, 72ms since prev update
starting update 7, 78ms since prev update
starting update 8, 86ms since prev update
starting update 9, 71ms since prev update
starting update 10, 76ms since prev update
starting update 11, 15836ms since prev update
starting update 12, 18616ms since prev update
starting update 13, 17480ms since prev update
starting update 14, 18779ms since prev update
starting update 15, 17008ms since prev update

For make_atari:

starting update 1
starting update 2, 2100ms since prev update
starting update 3, 731ms since prev update
starting update 4, 744ms since prev update
starting update 5, 737ms since prev update
starting update 6, 739ms since prev update
starting update 7, 756ms since prev update
starting update 8, 740ms since prev update
starting update 9, 735ms since prev update
starting update 10, 742ms since prev update
... no next output for at least 10 minutes

For my custom env (4 workers wrapped in SubprocVecEnv, observation shape is Box(0, 255, (90, 120, 5), uint8)):

starting update 1
starting update 2, 1140ms since prev update
starting update 3, 440ms since prev update
starting update 4, 456ms since prev update
starting update 5, 412ms since prev update
starting update 6, 437ms since prev update
starting update 7, 462ms since prev update
starting update 8, 457ms since prev update
starting update 9, 437ms since prev update
starting update 10, 433ms since prev update
starting update 11, 83691ms since prev update
starting update 11, 83691ms since prev update
starting update 12, 59344ms since prev update
starting update 13, 72802ms since prev update
starting update 14, 61118ms since prev update
starting update 15, 67188ms since prev update

System Info Describe the characteristic of your environment:

Additional context Add any other context about the problem here.

araffin commented 3 years ago

Hello, Probably a duplicate of https://github.com/hill-a/stable-baselines/issues/196 Which OS are you using?

I would recommend you to use PPO2 (or even Stable-Baselines3 PPO) as it also supports multiprocessing and usually give comparable results to ACKTR.

mily20001 commented 3 years ago

Hi, thanks for response @araffin I'm making kind of comparison how RL algorithms perform on problem simulated by my custom env, that's why I wanted to test ACKTR. I've seen #196, but in my case memory is not an issue I think My OS is openSUSE linux 15.2