Closed LaPluma030 closed 4 months ago
Can you show me the torch and python version you are using?
Can you show me the torch and python version you are using? i'm using torch1.8.1+cu111 as the requirement.txt said and python 3.9
I think this issue is probably due to insufficient memory, don't know if there is an operation to read a lot of data when running the train_mpe.py
. If so, is there a way to limit the memory cost of the program? thx
I also encountered this problem when using pycahrm on Windows, but there was no such problem when I used the Linux system. Is the reason for this being too many memory computing resources?
The memory compute resources being limited could be an issue. I did not face any issues while running the code on linux and MacOS.
One thing that you could try is to reduce the number of rollout threads --n_rollout_threads 2
and check if the code is executable in windows OS. Although, this will make the training quite slow, it is worth to check if number of parallel processes being high is the issue.
ok, I'll try this on linux
Traceback (most recent call last):
File "/mnt/InforMARL/onpolicy/scripts/train_mpe.py", line 315, in
I tried running train_mpe.py
on linux, and got the error message above, have you ever encountered the problem? It seems to happen when tring to import torch_geometric
Traceback (most recent call last): File "/mnt/InforMARL/onpolicy/scripts/train_mpe.py", line 315, in main(sys.argv[1:]) File "/mnt/InforMARL/onpolicy/scripts/train_mpe.py", line 289, in main runner = Runner(config) File "/mnt/InforMARL/onpolicy/runner/shared/graph_mpe_runner.py", line 24, in init super(GMPERunner, self).init(config) File "/mnt/InforMARL/onpolicy/runner/shared/base_runner.py", line 79, in init from onpolicy.algorithms.graph_mappo import GR_MAPPO as TrainAlgo File "/mnt/InforMARL/onpolicy/algorithms/graph_mappo.py", line 8, in from onpolicy.algorithms.graph_MAPPOPolicy import GR_MAPPOPolicy File "/mnt/InforMARL/onpolicy/algorithms/graph_MAPPOPolicy.py", line 7, in from onpolicy.algorithms.graph_actor_critic import GR_Actor, GR_Critic File "/mnt/InforMARL/onpolicy/algorithms/graph_actor_critic.py", line 9, in from onpolicy.algorithms.utils.gnn import GNNBase File "/mnt/InforMARL/onpolicy/algorithms/utils/gnn.py", line 6, in import torch_geometric File "/root/anaconda3/envs/InforMARL/lib/python3.9/site-packages/torch_geometric/init.py", line 4, in import torch_geometric.data File "/root/anaconda3/envs/InforMARL/lib/python3.9/site-packages/torch_geometric/data/init.py", line 1, in from .data import Data File "/root/anaconda3/envs/InforMARL/lib/python3.9/site-packages/torch_geometric/data/data.py", line 9, in from torch_sparse import SparseTensor File "/root/anaconda3/envs/InforMARL/lib/python3.9/site-packages/torch_sparse/init.py", line 14, in torch.ops.load_library(importlib.machinery.PathFinder().find_spec( File "/root/anaconda3/envs/InforMARL/lib/python3.9/site-packages/torch/_ops.py", line 104, in load_library ctypes.CDLL(path) File "/root/anaconda3/envs/InforMARL/lib/python3.9/ctypes/init.py", line 374, in init self._handle = _dlopen(self._name, mode) OSError: libcudart.so.11.0: cannot open shared object file: No such file or directory
I tried running
train_mpe.py
on linux, and got the error message above, have you ever encountered the problem? It seems to happen when tring toimport torch_geometric
by the way, my cuda version is 10.2
I've solved the problems above, thanks
How did you solve it? Did you install the cuda gpu? If so, can you take a look?
ERROR: Could not find a version that satisfies the requirement sip<4.20,>=4.19.4 (from pyqt5) (from versions: 5.0.0, 5.0.1, 5.1.0, 5.1.1, 5.1.2, 5.2.0, 5.3.0, 5.4.0, 5.5.0, 6.0.0, 6.0.1, 6.0.2, 6.0.3, 6.1.0, 6.1.1, 6.2.0, 6.3.0, 6.3.1, 6.4.0, 6.5.0, 6.5.1, 6.6.0, 6.6.1, 6.6.2, 6.7.0, 6.7.1, 6.7.2, 6.7.3, 6.7.4, 6.7.5, 6.7.6, 6.7.7, 6.7.8, 6.7.9, 6.7.10, 6.7.11, 6.7.12, 6.8.0, 6.8.1, 6.8.2, 6.8.3) ERROR: No matching distribution found for sip<4.20,>=4.19.4 Have you encountered this kind of problem?
How did you solve it? Did you install the cuda gpu? If so, can you take a look?
In my case, it's not the problem of cuda, I reinstalled torch-geometric
and torch-sparse
then it works
ERROR: Could not find a version that satisfies the requirement sip<4.20,>=4.19.4 (from pyqt5) (from versions: 5.0.0, 5.0.1, 5.1.0, 5.1.1, 5.1.2, 5.2.0, 5.3.0, 5.4.0, 5.5.0, 6.0.0, 6.0.1, 6.0.2, 6.0.3, 6.1.0, 6.1.1, 6.2.0, 6.3.0, 6.3.1, 6.4.0, 6.5.0, 6.5.1, 6.6.0, 6.6.1, 6.6.2, 6.7.0, 6.7.1, 6.7.2, 6.7.3, 6.7.4, 6.7.5, 6.7.6, 6.7.7, 6.7.8, 6.7.9, 6.7.10, 6.7.11, 6.7.12, 6.8.0, 6.8.1, 6.8.2, 6.8.3) ERROR: No matching distribution found for sip<4.20,>=4.19.4 Have you encountered this kind of problem?
i haven't, it seems to be the problem of PyQt, maybe you can try other versions
Can you use conda list to send all the installed package versions?
@Yu-zx, have you tried the following for installing torch-geometric
?
TORCH="1.8.0"
CUDA="cu102"
pip install --no-index torch-scatter -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html --user
pip install --no-index torch-sparse -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html --user
pip install torch-geometric --user
And have you checked this out for pyqt?
Let me know if any of these work for you.
Thanks
Closing this assuming the issue has been resolved. Please re-open if the issue still persists.
OSError: [WinError 1455] 页面文件太小,无法完成操作。 Error loading "D:\Anaconda\envs\InforMARL\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies.
find this error when running python -u onpolicy/scripts/train_mpe.py --use_valuenorm --use_popart --project_name "informarl" --env_name "GraphMPE" --algorithm_name "rmappo" --seed 0 --experiment_name "informarl" --scenario_name "navigation_graph" --num_agents 3 --collision_rew 5 --n_training_threads 1 --n_rollout_threads 32 --num_mini_batch 1 --episode_length 25 --num_env_steps 200000 --ppo_epoch 10 --use_ReLU --gain 0.01 --lr 7e-4 --critic_lr 7e-4 --user_name "marl" --use_cent_obs "False" --graph_feat_type "relative" --auto_mini_batch_size --target_mini_batch_size 32