oxwhirl / pymarl

Python Multi-Agent Reinforcement Learning framework
Apache License 2.0
1.89k stars 386 forks source link

killed without warnings or errors #7

Closed TimeBreaker closed 5 years ago

TimeBreaker commented 5 years ago

Hi, I have been running python3 src/main.py --config=qmix_smac --env-config=sc2 with env_args.map_name=2s3z and my process was killed suddenly. I have no idea why this occurred. Here is part of the output. (And by the way, I really don't understand why every INFO shows up eight times.) I know there is a similar problem here, but we don't have exactly the same problems. Anyone can help me? Thanks a lot. And also thanks for this repo.

`[INFO 12:57:42] my_main Beginning training for 10050000 timesteps

[INFO 12:57:43] absl No GL library found, so RGB rendering will be disabled. For software rendering install libosmesa. [INFO 12:57:43] absl No GL library found, so RGB rendering will be disabled. For software rendering install libosmesa. [INFO 12:57:43] absl No GL library found, so RGB rendering will be disabled. For software rendering install libosmesa. [INFO 12:57:43] absl No GL library found, so RGB rendering will be disabled. For software rendering install libosmesa. [INFO 12:57:43] absl No GL library found, so RGB rendering will be disabled. For software rendering install libosmesa. [INFO 12:57:43] absl No GL library found, so RGB rendering will be disabled. For software rendering install libosmesa. [INFO 12:57:43] absl No GL library found, so RGB rendering will be disabled. For software rendering install libosmesa. [INFO 12:57:43] absl No GL library found, so RGB rendering will be disabled. For software rendering install libosmesa. [INFO 12:57:43] absl Launching SC2: /home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 23352 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-ni4r5f6w/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50 [INFO 12:57:43] absl Launching SC2: /home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 19667 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-ywijkadp/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50 [INFO 12:57:43] absl Launching SC2: /home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 16964 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-0bs56nd5/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50 [INFO 12:57:43] absl Launching SC2: /home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 21594 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-z44i84pm/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50 [INFO 12:57:43] absl Launching SC2: /home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 21722 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-qu3cm6ig/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50 [INFO 12:57:43] absl Launching SC2: /home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 24486 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-nccgtyde/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50 [INFO 12:57:43] absl Launching SC2: /home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 17230 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-murh5_yx/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50 [INFO 12:57:43] absl Launching SC2: /home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 22863 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-ujzqhm1t/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50 [INFO 12:57:43] absl Connecting to: ws://127.0.0.1:21594/sc2api, attempt: 0, running: True [INFO 12:57:43] absl Connecting to: ws://127.0.0.1:17230/sc2api, attempt: 0, running: True [INFO 12:57:43] absl Connecting to: ws://127.0.0.1:19667/sc2api, attempt: 0, running: True [INFO 12:57:43] absl Connecting to: ws://127.0.0.1:24486/sc2api, attempt: 0, running: True [INFO 12:57:43] absl Connecting to: ws://127.0.0.1:23352/sc2api, attempt: 0, running: True [INFO 12:57:43] absl Connecting to: ws://127.0.0.1:21722/sc2api, attempt: 0, running: True [INFO 12:57:43] absl Connecting to: ws://127.0.0.1:16964/sc2api, attempt: 0, running: True [INFO 12:57:43] absl Connecting to: ws://127.0.0.1:22863/sc2api, attempt: 0, running: True Version: B69232 (SC2.4.6-Publish) Version: B69232 (SC2.4.6-Publish) Version: B69232 (SC2.4.6-Publish) Build: Oct 23 2018 01:43:04 Build: Oct 23 2018 01:43:04 Build: Oct 23 2018 01:43:04 Version: B69232 (SC2.4.6-Publish) Build: Oct 23 2018 01:43:04 Command Line: '"/home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64" -listen 127.0.0.1 -port 22863 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-ujzqhm1t/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50' Command Line: '"/home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64" -listen 127.0.0.1 -port 24486 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-nccgtyde/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50' Command Line: '"/home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64" -listen 127.0.0.1 -port 23352 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-ni4r5f6w/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50' Command Line: '"/home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64" -listen 127.0.0.1 -port 21594 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-z44i84pm/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50' Version: B69232 (SC2.4.6-Publish) Build: Oct 23 2018 01:43:04 Command Line: '"/home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64" -listen 127.0.0.1 -port 19667 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-ywijkadp/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50' Version: B69232 (SC2.4.6-Publish) Build: Oct 23 2018 01:43:04 Command Line: '"/home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64" -listen 127.0.0.1 -port 16964 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-0bs56nd5/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50' Version: B69232 (SC2.4.6-Publish) Build: Oct 23 2018 01:43:04 Command Line: '"/home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64" -listen 127.0.0.1 -port 21722 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-qu3cm6ig/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50' Version: B69232 (SC2.4.6-Publish) Build: Oct 23 2018 01:43:04 Command Line: '"/home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64" -listen 127.0.0.1 -port 17230 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-murh5_yx/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50' Starting up... Starting up... Starting up... Starting up... Starting up... Starting up... Starting up... Starting up... [INFO 12:57:46] absl Connecting to: ws://127.0.0.1:16964/sc2api, attempt: 1, running: True [INFO 12:57:46] absl Connecting to: ws://127.0.0.1:22863/sc2api, attempt: 1, running: True [INFO 12:57:46] absl Connecting to: ws://127.0.0.1:21722/sc2api, attempt: 1, running: True [INFO 12:57:46] absl Connecting to: ws://127.0.0.1:17230/sc2api, attempt: 1, running: True [INFO 12:57:46] absl Connecting to: ws://127.0.0.1:23352/sc2api, attempt: 1, running: True [INFO 12:57:46] absl Connecting to: ws://127.0.0.1:19667/sc2api, attempt: 1, running: True [INFO 12:57:46] absl Connecting to: ws://127.0.0.1:21594/sc2api, attempt: 1, running: True Startup Phase 1 complete Startup Phase 1 complete [INFO 12:57:46] absl Connecting to: ws://127.0.0.1:24486/sc2api, attempt: 1, running: True Startup Phase 1 complete Startup Phase 1 complete Startup Phase 1 complete Startup Phase 1 complete Startup Phase 1 complete Startup Phase 1 complete [INFO 12:57:47] absl Connecting to: ws://127.0.0.1:22863/sc2api, attempt: 2, running: True [INFO 12:57:48] absl Connecting to: ws://127.0.0.1:21594/sc2api, attempt: 2, running: True [INFO 12:57:47] absl Connecting to: ws://127.0.0.1:23352/sc2api, attempt: 2, running: True [INFO 12:57:47] absl Connecting to: ws://127.0.0.1:24486/sc2api, attempt: 2, running: True [INFO 12:57:48] absl Connecting to: ws://127.0.0.1:17230/sc2api, attempt: 2, running: True [INFO 12:57:48] absl Connecting to: ws://127.0.0.1:16964/sc2api, attempt: 2, running: True [INFO 12:57:49] absl Connecting to: ws://127.0.0.1:19667/sc2api, attempt: 2, running: True [INFO 12:57:56] absl Connecting to: ws://127.0.0.1:24486/sc2api, attempt: 3, running: True [INFO 12:57:58] absl Connecting to: ws://127.0.0.1:22863/sc2api, attempt: 3, running: True Killed `

samvelyan commented 5 years ago

Everything is shows up 8 times because you are running 8 SC2 envs at the same time. To run just one env set the runner=episode, otherwise the parallel runner (the default) option will run 8 envs in parallel.

I suspect this happening from memory issues, perhaps running only a single SC2 process will be possible. Also, have you tried addressing the info remarks, about installing GL??

TimeBreaker commented 5 years ago

@samvelyan Thank you very much! I changed the runnerto episodeand also changed batch_size_run to one. The program runs a lot faster than before but there is still something I can't figure out, for instance, why the launch error occurs. Here is the output of the program. Any idea why this occurs? Thanks a lot!

About the GL, I didn't install it because I thought pysc2 can run normally means the rendering feature is already satisfied. And I learned something here, it says "You can send either feature layer resolution or rgb resolution or both." The GL belongs to the rgb resolution. So when I set want-rgb = False in the file platforms.py from pysc2, this GL info disappears but every other info remains.(Also, after I set want-rgb = False, pysc2 still runs normally.)

[INFO 08:06:10] my_main Beginning training for 10050000 timesteps [INFO 08:06:10] absl Launching SC2: /home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64 -listen 127.0.0.1 -port 24956 -dataDir /home/ch/pymarl/3rdparty/StarCraftII/ -tempDir /tmp/sc-a_3kagb0/ -displayMode 0 -windowwidth 1920 -windowheight 1200 -windowx 50 -windowy 50 [ERROR 08:06:10] absl Failed to launch Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/pysc2/lib/sc_process.py", line 173, in _launch return subprocess.Popen(args, cwd=run_config.cwd, env=run_config.env) File "/usr/lib/python3.6/subprocess.py", line 709, in init restore_signals, start_new_session) File "/usr/lib/python3.6/subprocess.py", line 1275, in _execute_child restore_signals, start_new_session, preexec_fn) OSError: [Errno 12] Cannot allocate memory [ERROR 08:06:11] pymarl Failed after 0:00:14! Traceback (most recent calls WITHOUT Sacred internals): File "src/main.py", line 34, in my_main run(_run, _config, _log) File "/home/ch/pymarl/src/run.py", line 48, in run run_sequential(args=args, logger=logger) File "/home/ch/pymarl/src/run.py", line 166, in run_sequential episode_batch = runner.run(test_mode=False) File "/home/ch/pymarl/src/runners/episode_runner.py", line 49, in run self.reset() File "/home/ch/pymarl/src/runners/episode_runner.py", line 45, in reset self.env.reset() File "/usr/local/lib/python3.6/dist-packages/smac/env/starcraft2/starcraft2.py", line 320, in reset self._launch() File "/usr/local/lib/python3.6/dist-packages/smac/env/starcraft2/starcraft2.py", line 279, in _launch window_size=self.window_size) File "/usr/local/lib/python3.6/dist-packages/pysc2/run_configs/platforms.py", line 208, in start want_rgb=want_rgb, extra_args=extra_args, kwargs) File "/usr/local/lib/python3.6/dist-packages/pysc2/run_configs/platforms.py", line 97, in start self, exec_path=exec_path, version=version, kwargs) File "/usr/local/lib/python3.6/dist-packages/pysc2/lib/sc_process.py", line 113, in init self._proc = self._launch(run_config, args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/pysc2/lib/sc_process.py", line 176, in _launch raise SC2LaunchError("Failed to launch: %s" % args) pysc2.lib.sc_process.SC2LaunchError: Failed to launch: ['/home/ch/pymarl/3rdparty/StarCraftII/Versions/Base69232/SC2_x64', '-listen', '127.0.0.1', '-port', '24956', '-dataDir', '/home/ch/pymarl/3rdparty/StarCraftII/', '-tempDir', '/tmp/sc-a_3kagb0/', '-displayMode', '0', '-windowwidth', '1920', '-windowheight', '1200', '-windowx', '50', '-windowy', '50']

samvelyan commented 5 years ago

This line in the output might explain it OSError: [Errno 12] Cannot allocate memory.

The rgb issues should not be issue on our side since we are not using them either. We use the raw interface of sc2 client, hence only receive a vector of numeric features per unit.

TimeBreaker commented 5 years ago

@samvelyan Thank you for your suggestion! I have been trying to solve this OSError: [Errno 12] Cannot allocate memory, I tried to increase the virtual memory of Ubuntu but this didn't seem to work for me. What I don't understand is, why pysc2 runs normally while pymarl cannot allocate memory.

Maybe this is happening because I didn't allocate enough memory and I should try to increase the memory? I am using Ubuntu 18.04.2 and when I type free -m in the command line (sudo mode), I get this: ___total used free shared buff/cache available Mem: 1970 744 641 0 584 1063 Swap: 1523 687 835

And when I type df -h, I get this: (part of the outcome) Filesystem Size Used Avail Use% Mounted on /dev/sda1 30G 21G 7.7G 73% / It seems like there is still enough memory. Any idea about this? Thanks in advance.

TimeBreaker commented 5 years ago

update: I used the server in my research projects' group to do all this again, and it worked! Conclusion: I think all the above error happened because of the insufficient performance of my PC. (I wish I had done this two weeks earlier.) Thank you all!