RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

hjing100 commented 1 year ago

执行： python3 cli.py \ --method pet \ --pattern_ids 0 1 2 3 4 \ --data_dir ag_news_csv \ --model_type albert \ --model_name_or_path albert-xxlarge-v2 \ --task_name agnews \ --output_dir output \ --do_train \ --do_eval \ --pet_per_gpu_train_batch_size 2 \ --pet_gradient_accumulation_steps 8 \ --pet_max_steps 250 \ --sc_per_gpu_unlabeled_batch_size 2 \ --sc_gradient_accumulation_steps 8 \ --sc_max_steps 5000 有报错： Evaluating: 0%| | 0/15000 [00:00<?, ?it/s] Traceback (most recent call last): File "cli.py", line 282, in main() File "cli.py", line 263, in main no_distillation=args.no_distillation, seed=args.seed) File "/home/123456/projects/prompt/pet-master/pet/modeling.py", line 249, in train_pet save_unlabeled_logits=not no_distillation, seed=seed) File "/home/123456/projects/prompt/pet-master/pet/modeling.py", line 355, in train_pet_ensemble unlabeled_data=unlabeled_data)) File "/home/123456/projects/prompt/pet-master/pet/modeling.py", line 434, in train_single_model results_dict['train_set_before_training'] = evaluate(model, train_data, eval_config)['scores']['acc'] File "/home/123456/projects/prompt/pet-master/pet/modeling.py", line 490, in evaluate n_gpu=config.n_gpu, decoding_strategy=config.decoding_strategy, priming=config.priming) File "/home/123456/projects/prompt/pet-master/pet/wrapper.py", line 376, in eval logits = EVALUATION_STEP_FUNCTIONSself.config.wrapper_type(batch) File "/home/123456/projects/prompt/pet-master/pet/wrapper.py", line 524, in mlm_eval_step outputs = self.model(inputs) File "/home/123456/.conda/envs/python36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, *kwargs) File "/home/123456/.conda/envs/python36/lib/python3.6/site-packages/transformers/modeling_albert.py", line 814, in forward output_hidden_states=output_hidden_states, File "/home/123456/.conda/envs/python36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(input, kwargs) File "/home/123456/.conda/envs/python36/lib/python3.6/site-packages/transformers/modeling_albert.py", line 563, in forward output_hidden_states=output_hidden_states, File "/home/123456/.conda/envs/python36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(*input, *kwargs) File "/home/123456/.conda/envs/python36/lib/python3.6/site-packages/transformers/modeling_albert.py", line 327, in forward hidden_states = self.embedding_hidden_mapping_in(hidden_states) File "/home/123456/.conda/envs/python36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in call result = self.forward(input, **kwargs) File "/home/123456/.conda/envs/python36/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward return F.linear(input, self.weight, self.bias) File "/home/123456/.conda/envs/python36/lib/python3.6/site-packages/torch/nn/functional.py", line 1612, in linear output = input.matmul(weight.t()) RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

hjing100 commented 1 year ago

CUDA Version: 12.0

hjing100 commented 1 year ago

升级torch和torchvision版本： torch 1.10.1 torchvision 0.11.2 运行显示： Evaluating: 0%| | 0/15000 [00:00<?, ?it/s] Traceback (most recent call last): File "cli.py", line 284, in main() File "cli.py", line 265, in main no_distillation=args.no_distillation, seed=args.seed) File "/home/123456/projects/prompt/pet-master/pet/modeling.py", line 249, in train_pet save_unlabeled_logits=not no_distillation, seed=seed) File "/home/123456/projects/prompt/pet-master/pet/modeling.py", line 355, in train_pet_ensemble unlabeled_data=unlabeled_data)) File "/home/123456/projects/prompt/pet-master/pet/modeling.py", line 434, in train_single_model results_dict['train_set_before_training'] = evaluate(model, train_data, eval_config)['scores']['acc'] File "/home/123456/projects/prompt/pet-master/pet/modeling.py", line 490, in evaluate n_gpu=config.n_gpu, decoding_strategy=config.decoding_strategy, priming=config.priming) File "/home/123456/projects/prompt/pet-master/pet/wrapper.py", line 376, in eval logits = EVALUATION_STEP_FUNCTIONSself.config.wrapper_type(batch) File "/home/123456/projects/prompt/pet-master/pet/wrapper.py", line 524, in mlm_eval_step outputs = self.model(inputs) File "/home/123456/.conda/envs/python36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/123456/.conda/envs/python36/lib/python3.6/site-packages/transformers/modeling_albert.py", line 814, in forward output_hidden_states=output_hidden_states, File "/home/123456/.conda/envs/python36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/123456/.conda/envs/python36/lib/python3.6/site-packages/transformers/modeling_albert.py", line 548, in forward token_type_ids = torch.zeros(input_shape, dtype=torch.long, device=device) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

hjing100 commented 1 year ago

解决方法： conda install pytorch torchvision cudatoolkit=11.3 -c pytorch

timoschick / pet

RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` #99