marlbenchmark / on-policy

This is the official implementation of Multi-Agent PPO (MAPPO).
MIT License
1.26k stars 288 forks source link

Error when run ./ #98

Open ChuangZhang1999 opened 8 months ago

ChuangZhang1999 commented 8 months ago

When I tried to run ./, I met the following issue:

obs_space:  [Box(18,), Box(18,), Box(18,)]
share_obs_space:  [Box(54,), Box(54,), Box(54,)]
act_space:  [Discrete(5), Discrete(5), Discrete(5)]
Traceback (most recent call last):
  File "../train/", line 174, in <module>
  File "../train/", line 159, in main
  File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/runner/shared/", line 28, in run
    values, actions, action_log_probs, rnn_states, rnn_states_critic, actions_env = self.collect(step)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/autograd/", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/runner/shared/", line 103, in collect
  File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/algorithms/r_mappo/algorithm/", line 71, in get_actions    deterministic)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/algorithms/r_mappo/algorithm/", line 64, in forward
    actor_features = self.base(obs)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/algorithms/utils/", line 56, in forward
    x = self.mlp(x)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/nvme1n1/zhangchuang_23/MARL/on-policy-main/onpolicy/algorithms/utils/", line 27, in forward
    x = self.fc1(x)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/", line 100, in forward
    input = module(input)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/modules/", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "/home/zhangchuang_23/envs/MARL/lib/python3.6/site-packages/torch/nn/", line 1610, in linear
    ret = torch.addmm(bias, input, weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
satpreetsingh commented 5 months ago

Try running this and see if you still get the error.

import torch
print("Is CUDA available:", torch.cuda.is_available())
print("CUDA version:", torch.version.cuda)
print("cuDNN version:", torch.backends.cudnn.version())

a = torch.randn(1024, 1024, device="cuda:0")
b = torch.randn(1024, 1024, device="cuda:0")
c = torch.matmul(a, b)  # Matrix multiplication
print("Matrix multiplication result shape:", c.shape)

If so, you need to fix your PyTorch/CUDA installation. Try

conda install pytorch  -c pytorch
zoeyuchao commented 1 month ago

Fixed!try the new code!