zihangdai / xlnet

XLNet: Generalized Autoregressive Pretraining for Language Understanding
Apache License 2.0
6.18k stars 1.18k forks source link

CPU->GPU Memcpy failed when finetuning with STS-B #297

Open xavinatalia opened 5 months ago

xavinatalia commented 5 months ago

When Finetuning XLNet-large with STS-B using 4 3090 with cuda version 11.7, I got following message: 2024-04-26 23:17:09.192214: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_EXECUTION_FAILED 2024-04-26 23:17:09.192279: E tensorflow/stream_executor/cuda/cuda_blas.cc:2620] Internal: failed BLAS call, see log for details 2024-04-26 23:17:12.114126: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_EXECUTION_FAILED 2024-04-26 23:17:12.114166: E tensorflow/stream_executor/cuda/cuda_blas.cc:2620] Internal: failed BLAS call, see log for details 2024-04-26 23:17:12.114698: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_EXECUTION_FAILED 2024-04-26 23:17:12.114761: E tensorflow/stream_executor/cuda/cuda_blas.cc:2620] Internal: failed BLAS call, see log for details 2024-04-26 23:17:12.121236: E tensorflow/stream_executor/cuda/cuda_blas.cc:698] failed to run cuBLAS routine cublasGemmBatchedEx: CUBLAS_STATUS_EXECUTION_FAILED 2024-04-26 23:17:12.121270: E tensorflow/stream_executor/cuda/cuda_blas.cc:2620] Internal: failed BLAS call, see log for details 2024-04-26 23:17:12.659520: I tensorflow/stream_executor/stream.cc:2079] [stream=0x4854880,impl=0x46ed360] did not wait for [stream=0x4854590,impl=0x46ed390] 2024-04-26 23:17:12.659567: I tensorflow/stream_executor/stream.cc:5027] [stream=0x4854880,impl=0x46ed360] did not memcpy host-to-device; source: 0x7f13e001b300 2024-04-26 23:17:12.659624: F tensorflow/core/common_runtime/gpu/gpu_util.cc:339] CPU->GPU Memcpy failed 已放弃 (核心已转储)