zjukg / KoPA

[Paper][ACM MM 2024] Making Large Language Models Perform Better in Knowledge Graph Completion
MIT License
145 stars 8 forks source link

出现问题:NotImplementedError: Cannot copy out of meta tensor; no data! 请作者帮忙解决! #17

Closed chtkg closed 11 months ago

chtkg commented 11 months ago

出现这个问题,我看了之前的issue,调整transformers库版本 4.28.0, torch版本2.0.0,还是不能解决问题。请原作者查看源代码! root@autodl-container-21104cb00b-08b6ed91:~/autodl-tmp/KoPA# python finetune_kopa.py Training Alpaca-LoRA model with params: base_model: huggyllama/llama-7b data_path: data/UMLS-train.json output_dir: data/save batch_size: 16 micro_batch_size: 16 num_epochs: 2 learning_rate: 0.0003 cutoff_len: 512 val_set_size: 0 lora_r: 16 num_prefix: 1 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: ['q_proj', 'v_proj'] train_on_inputs: True add_eos_token: False group_by_length: False wandb_project: wandb_run_name: wandb_watch: wandb_log_model: resume_from_checkpoint: False prompt template: alpaca kge model: data/UMLS-rotate.pth

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████| 2/2 [00:03<00:00, 1.97s/it] You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 Model parameters device before moving to CUDA: cuda:0 Model parameters device after moving to CUDA: cuda:0 /root/autodl-tmp/KoPA/process_kge.py:10: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). ent_embs = torch.tensor(kge_model["ent_embeddings.weight"]) /root/autodl-tmp/KoPA/process_kge.py:11: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). rel_embs = torch.tensor(kge_model["rel_embeddings.weight"]) 1024 512 Adapter Trained From Scratch Map: 100%|██████████████████████████████████████████████████████████████████████████████| 15648/15648 [00:14<00:00, 1047.00 examples/s] Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. Traceback (most recent call last): File "finetune_kopa.py", line 288, in fire.Fire(train) File "/root/miniconda3/lib/python3.8/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/root/miniconda3/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/root/miniconda3/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "finetune_kopa.py", line 236, in train trainer = transformers.Trainer( File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 481, in init self._move_model_to_device(model, args.device) File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 716, in _move_model_to_device model = model.to(device) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1160, in to return self._apply(convert) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply module._apply(fn) [Previous line repeated 6 more times] File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 833, in _apply param_applied = fn(param) File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1158, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) NotImplementedError: Cannot copy out of meta tensor; no data!

Zhang-Each commented 11 months ago

你好,我们在实验过程中从未遇到过类似的问题,我推测可能与服务器中的CUDA版本、Python版本等配置有关,注意到我们在README里写的python版本是3.9.16,从你的报错来看你的python版本是3.8的,此外我们的实验室在实体服务器和gpu上进行的,我从你的报错中注意到你用的是autodl云平台。关于autodl云平台我们没有相关使用经验,因此也不了解其中的服务器使用细节。