ikostrikov / pytorch-a3c

PyTorch implementation of Asynchronous Advantage Actor Critic (A3C) from "Asynchronous Methods for Deep Reinforcement Learning".
MIT License
1.23k stars 279 forks source link

What is the purpose of `os.environ['OMP_NUM_THREADS'] = '1'`? #33

Closed xmfbit closed 7 years ago

xmfbit commented 7 years ago

I wonder why os.environ['OMP_NUM_THREADS'] = '1' is used in the main method: https://github.com/ikostrikov/pytorch-a3c/blob/master/main.py#L43.

I ran a demo about CartPole-v0 using openai gym with LSTM. I found that if I removed it, then the agent couldn't learn a good policy. So why?

This is the logging info without os.environ['OMP_NUM_THREADS'] = '1'.

Time 00h 00m 00s, episode reward 10.0, episode length 10
Time 00h 01m 00s, episode reward 10.0, episode length 10
Time 00h 02m 00s, episode reward 12.0, episode length 12
Time 00h 03m 01s, episode reward 12.0, episode length 12
Time 00h 04m 01s, episode reward 14.0, episode length 14
Time 00h 05m 01s, episode reward 14.0, episode length 14
Time 00h 06m 01s, episode reward 12.0, episode length 12
Time 00h 07m 03s, episode reward 13.0, episode length 13
Time 00h 08m 03s, episode reward 10.0, episode length 10

And this is the info after adding it:

Time 00h 00m 00s, episode reward 10.0, episode length 10
Time 00h 01m 00s, episode reward 18.0, episode length 18
Time 00h 02m 00s, episode reward 16.0, episode length 16
Time 00h 03m 00s, episode reward 29.0, episode length 29
Time 00h 04m 00s, episode reward 48.0, episode length 48
Time 00h 05m 00s, episode reward 12.0, episode length 12
Time 00h 06m 00s, episode reward 107.0, episode length 107
Time 00h 07m 01s, episode reward 79.0, episode length 79
Time 00h 08m 01s, episode reward 153.0, episode length 153
Time 00h 09m 01s, episode reward 77.0, episode length 77
Time 00h 10m 01s, episode reward 91.0, episode length 91
Time 00h 11m 01s, episode reward 129.0, episode length 129
Time 00h 12m 02s, episode reward 137.0, episode length 137
Time 00h 13m 02s, episode reward 117.0, episode length 117
Time 00h 14m 03s, episode reward 155.0, episode length 155
Time 00h 15m 03s, episode reward 200.0, episode length 200
Time 00h 16m 03s, episode reward 200.0, episode length 200
ikostrikov commented 7 years ago

The purpose is not to use OMP threads within numpy processes. Apparently they start to block each other.