zjunlp / MKG_Analogy

[ICLR 2023] Multimodal Analogical Reasoning over Knowledge Graphs
https://zjunlp.github.io/project/MKG_Analogy/
MIT License
99 stars 11 forks source link

预训练过程报错 #24

Closed wzl0422 closed 3 months ago

wzl0422 commented 3 months ago

我是用服务器跑的,就一个gpu,预训练过程一到epoch6就会报以下错误,希望作者帮我解决一下。

Epoch 5: 100%|████████████████████████████████████████████████████████████████████████████████████████| 1562/1562 [05:12<00:00, 5.00it/s, loss=14.8, v_num=0, entity_hits10=0.000961, entity_hits1=6e-5] Traceback (most recent call last):
File "main.py", line 167, in main() File "main.py", line 159, in main lit_model.load_state_dict(torch.load(path, map_location='cuda:2')["state_dict"]) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 712, in load return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 1046, in _load result = unpickler.load() File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 1016, in persistent_load load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location)) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 1001, in load_tensor wrap_storage=restore_location(storage, location), File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 970, in restore_location return default_restore_location(storage, map_location) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 176, in default_restore_location result = fn(storage, location) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 152, in _cuda_deserialize device = validate_cuda_device(location) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 143, in validate_cuda_device raise RuntimeError('Attempting to deserialize object on CUDA device ' RuntimeError: Attempting to deserialize object on CUDA device 2 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device.

flow3rdown commented 3 months ago

您好,请尝试将link修改为 lit_model.load_state_dict(torch.load(path, map_location='cuda')["state_dict"]) 我们也会同步修改代码

flow3rdown commented 3 months ago

您好,请尝试将link修改为 lit_model.load_state_dict(torch.load(path, map_location='cuda')["state_dict"]) 我们也会同步修改代码

Traceback (most recent call last): File "main.py", line 167, in main() File "main.py", line 159, in main lit_model.load_state_dict(torch.load(path, map_location='cuda:2')["state_dict"]) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 712, in load return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 1046, in _load result = unpickler.load() File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 1016, in persistent_load load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location)) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 1001, in load_tensor wrap_storage=restore_location(storage, location), File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 970, in restore_location return default_restore_location(storage, map_location) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 176, in default_restore_location result = fn(storage, location) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 152, in _cuda_deserialize device = validate_cuda_device(location) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 143, in validate_cuda_device raise RuntimeError('Attempting to deserialize object on CUDA device ' RuntimeError: Attempting to deserialize object on CUDA device 2 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device. 修改了您说的这句代码 还是这个错误

有修改成功吗,这个报错显示的还是cuda:2

File "main.py", line 159, in main
lit_model.load_state_dict(torch.load(path, map_location='cuda:2')["state_dict"])
wzl0422 commented 3 months ago

您好,请尝试将link修改为 lit_model.load_state_dict(torch.load(path, map_location='cuda')["state_dict"]) 我们也会同步修改代码

Traceback (most recent call last): File "main.py", line 167, in main() File "main.py", line 159, in main lit_model.load_state_dict(torch.load(path, map_location='cuda:2')["state_dict"]) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 712, in load return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 1046, in _load result = unpickler.load() File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 1016, in persistent_load load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location)) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 1001, in load_tensor wrap_storage=restore_location(storage, location), File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 970, in restore_location return default_restore_location(storage, map_location) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 176, in default_restore_location result = fn(storage, location) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 152, in _cuda_deserialize device = validate_cuda_device(location) File "/root/miniconda3/envs/MKG/lib/python3.8/site-packages/torch/serialization.py", line 143, in validate_cuda_device raise RuntimeError('Attempting to deserialize object on CUDA device ' RuntimeError: Attempting to deserialize object on CUDA device 2 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device. 修改了您说的这句代码 还是这个错误

有修改成功吗,这个报错显示的还是cuda:2

File "main.py", line 159, in main
lit_model.load_state_dict(torch.load(path, map_location='cuda:2')["state_dict"])

抱歉抱歉我的问题,我重新试一下,刚才修改完忘记保存了,可能运行的还是未保存的,麻烦您了!感谢!