oxwhirl / pymarl

Python Multi-Agent Reinforcement Learning framework
Apache License 2.0
1.89k stars 387 forks source link

Failed to run coma #50

Open rical730 opened 4 years ago

rical730 commented 4 years ago

Hi, I can successfully run qmix experiment

bash run.sh $GPU python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=2s3z

but errors occurred when I replace qmix with coma

bash run.sh $GPU python3 src/main.py --config=coma --env-config=sc2 with env_args.map_name=2s3z

errors:

Listening on: 127.0.0.1:20214
Startup Phase 3 complete. Ready for commands.
Listening on: 127.0.0.1:22734
Startup Phase 3 complete. Ready for commands.
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface disabled
Configure: score interface disabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface disabled
Configure: score interface disabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
Next launch phase started: 2
Next launch phase started: 3
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface disabled
Configure: score interface disabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
Next launch phase started: 2
Next launch phase started: 3
Listening on: 127.0.0.1:24234
Next launch phase started: 4
Next launch phase started: 4
Next launch phase started: 5
Next launch phase started: 6
Next launch phase started: 7
Next launch phase started: 8
Next launch phase started: 5
Next launch phase started: 6
Next launch phase started: 7
Next launch phase started: 8
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface disabled
Configure: score interface disabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
Next launch phase started: 2
Next launch phase started: 3
Next launch phase started: 4
Next launch phase started: 5
Next launch phase started: 6
Next launch phase started: 7
Next launch phase started: 8
Next launch phase started: 2
Next launch phase started: 3
Next launch phase started: 4
Next launch phase started: 5
Next launch phase started: 6
Next launch phase started: 7
Next launch phase started: 8
[INFO 06:15:22] absl Connecting to: ws://127.0.0.1:22734/sc2api, attempt: 9, running: True
[INFO 06:15:22] absl Connecting to: ws://127.0.0.1:24234/sc2api, attempt: 9, running: True
Startup Phase 3 complete. Ready for commands.
[INFO 06:15:22] absl Connecting to: ws://127.0.0.1:22779/sc2api, attempt: 9, running: True
[INFO 06:15:22] absl Connecting to: ws://127.0.0.1:20214/sc2api, attempt: 9, running: True
Listening on: 127.0.0.1:22779
Startup Phase 3 complete. Ready for commands.
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface disabled
Configure: score interface disabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
Next launch phase started: 2
Next launch phase started: 3
Next launch phase started: 4
Next launch phase started: 5
Next launch phase started: 6
Next launch phase started: 7
Next launch phase started: 8
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface disabled
Configure: score interface disabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
Next launch phase started: 2
Next launch phase started: 3
Next launch phase started: 4
Next launch phase started: 5
Next launch phase started: 6
Next launch phase started: 7
Next launch phase started: 8
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface disabled
Configure: score interface disabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
Next launch phase started: 2
Next launch phase started: 3
Next launch phase started: 4
Next launch phase started: 5
Next launch phase started: 6
Next launch phase started: 7
Next launch phase started: 8
[INFO 06:15:23] absl Connecting to: ws://127.0.0.1:22779/sc2api, attempt: 10, running: True
Requesting to join a single player game
Configuring interface options
Configure: raw interface enabled
Configure: feature layer interface disabled
Configure: score interface disabled
Configure: render interface disabled
Entering load game phase.
Launching next game.
Next launch phase started: 2
Next launch phase started: 3
Next launch phase started: 4
Next launch phase started: 5
Next launch phase started: 6
Next launch phase started: 7
Next launch phase started: 8
Process Process-6:
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pysc2/lib/protocol.py", line 66, in catch_websocket_connection_errors
    yield
  File "/usr/local/lib/python3.5/dist-packages/pysc2/lib/protocol.py", line 183, in _read
    response_str = self._sock.recv()
  File "/usr/local/lib/python3.5/dist-packages/websocket/_core.py", line 314, in recv
    opcode, data = self.recv_data()
  File "/usr/local/lib/python3.5/dist-packages/websocket/_core.py", line 331, in recv_data
    opcode, frame = self.recv_data_frame(control_frame)
  File "/usr/local/lib/python3.5/dist-packages/websocket/_core.py", line 344, in recv_data_frame
    frame = self.recv_frame()
  File "/usr/local/lib/python3.5/dist-packages/websocket/_core.py", line 378, in recv_frame
    return self.frame_buffer.recv_frame()
  File "/usr/local/lib/python3.5/dist-packages/websocket/_abnf.py", line 361, in recv_frame
    self.recv_header()
  File "/usr/local/lib/python3.5/dist-packages/websocket/_abnf.py", line 309, in recv_header
    header = self.recv_strict(2)
  File "/usr/local/lib/python3.5/dist-packages/websocket/_abnf.py", line 396, in recv_strict
    bytes_ = self.recv(min(16384, shortage))
  File "/usr/local/lib/python3.5/dist-packages/websocket/_core.py", line 453, in _recv
    return recv(self.sock, bufsize)
  File "/usr/local/lib/python3.5/dist-packages/websocket/_socket.py", line 115, in recv
    "Connection is already closed.")
websocket._exceptions.WebSocketConnectionClosedException: Connection is already closed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pysc2/lib/protocol.py", line 151, in send
    res = self.send_req(req)
  File "/usr/local/lib/python3.5/dist-packages/pysc2/lib/protocol.py", line 131, in send_req
    return self.read()
  File "/usr/local/lib/python3.5/dist-packages/pysc2/lib/stopwatch.py", line 212, in _stopwatch
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/pysc2/lib/protocol.py", line 102, in read
    response = self._read()
  File "/usr/local/lib/python3.5/dist-packages/pysc2/lib/protocol.py", line 183, in _read
    response_str = self._sock.recv()
  File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.5/dist-packages/pysc2/lib/protocol.py", line 68, in catch_websocket_connection_errors
    raise ConnectionError("Connection already closed. SC2 probably crashed. "
pysc2.lib.protocol.ConnectionError: Connection already closed. SC2 probably crashed. Check the error log.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/pymarl/src/runners/parallel_runner.py", line 237, in env_worker
    env.reset()
  File "/usr/local/lib/python3.5/dist-packages/smac/env/starcraft2/starcraft2.py", line 347, in reset
    self._launch()
  File "/usr/local/lib/python3.5/dist-packages/smac/env/starcraft2/starcraft2.py", line 314, in _launch
    self._controller.join_game(join)
  File "/usr/local/lib/python3.5/dist-packages/pysc2/lib/remote_controller.py", line 99, in _valid_status
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/pysc2/lib/remote_controller.py", line 74, in _check_error
    return check_error(func(*args, **kwargs), error_enum)
  File "/usr/local/lib/python3.5/dist-packages/pysc2/lib/stopwatch.py", line 212, in _stopwatch
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/pysc2/lib/remote_controller.py", line 209, in join_game
    return self._client.send(join_game=req_join_game)
  File "/usr/local/lib/python3.5/dist-packages/pysc2/lib/protocol.py", line 153, in send
    raise ConnectionError("Error during %s: %s" % (name, e))
pysc2.lib.protocol.ConnectionError: Error during join_game: Connection already closed. SC2 probably crashed. Check the error log.
tabzraz commented 4 years ago

Do you have enough ram to run the 8 sc2 instances in parallel? The QMIX config only uses 1, so maybe that's why it's working.

nhanph commented 3 years ago

Try lowering batch size, I got error running with 128 but it works fine with 64.