Open yushengsu-thu opened 6 months ago
you need to either disable BF16 (-DENABLE_BF16
) or instruct your compiler to compile for a more recent GPU (Ampere) that actually has hardware support for bf16
similar ERROR
---------------------------------------------
→ cuDNN is manually disabled by default, run make with `USE_CUDNN=1` to try to enable
✓ OpenMP found
✓ OpenMPI found, OK to train with multiple GPUs
✓ nvcc found, including GPU/CUDA support
---------------------------------------------
/usr/local/cuda/bin/nvcc -O3 -t=0 --use_fast_math -DMULTI_GPU -DENABLE_FP16 train_gpt2.cu -lcublas -lcublasLt -L/usr/lib/x86_64-linux-gnu/openmpi/lib/ -I/usr/lib/x86_64-linux-gnu/openmpi/include -lmpi -lnccl -o train_gpt2cu
train_gpt2.cu(215): error: no instance of overloaded function "atomicAdd" matches the argument list
argument types are: (half2 *, half2)
train_gpt2.cu(242): error: no operator "+=" matches these operands
operand types are: floatX += __half
train_gpt2.cu(284): warning #20012-D: __device__ annotation is ignored on a function("Packed128") that is explicitly defaulted on its first declaration
train_gpt2.cu(1135): error: no operator "+=" matches these operands
operand types are: floatX += floatX
train_gpt2.cu(80): warning #177-D: variable "ncclFloatN" was declared but never referenced
3 errors detected in the compilation of "train_gpt2.cu".
make: *** [Makefile:203: train_gpt2cu] Error 255
Try upgrading your Cuda version to 12.4.1?
#if defined(__CUDACC__) && (!defined(__CUDA_ARCH__) || (__CUDA_ARCH__ >= 800) || defined(_NVHPC_CUDA))
This basically means functions are not available for computation capability <8.0..
Note, the header source is dependent on cuda tool kit version. Things that cannot be compiled in 12.1 may be compilable in 12.4 (this is the case for me).
By default, PRECISION=BF16
.
make
# It is the same as:
PRECISION=BF16 make
Compile with other options can also solve this issue.
PRECISION=FP16 make
# or
PRECISION=FP32 make
Related code in Makefile
:
# Precision settings, default to bf16 but ability to override
PRECISION ?= BF16
VALID_PRECISIONS := FP32 FP16 BF16
ifeq ($(filter $(PRECISION),$(VALID_PRECISIONS)),)
$(error Invalid precision $(PRECISION), valid precisions are $(VALID_PRECISIONS))
endif
ifeq ($(PRECISION), FP32)
PFLAGS = -DENABLE_FP32
else ifeq ($(PRECISION), FP16)
PFLAGS = -DENABLE_FP16
else
PFLAGS = -DENABLE_BF16
endif
upgrade nvcc to 12.4.
- check the computation capability of the GPU card, in the source code include/cuda_bf16.h (or hpp). You might see
#if defined(__CUDACC__) && (!defined(__CUDA_ARCH__) || (__CUDA_ARCH__ >= 800) || defined(_NVHPC_CUDA))
This basically means functions are not available for computation capability <8.0..
Note, the header source is dependent on cuda tool kit version. Things that cannot be compiled in 12.1 may be compilable in 12.4 (this is the case for me).
This solved my issue when I saw the error on a V100 GPU (AWS P3 instance). Updating to CUDA 12.5 fixed the make error.
Got it with cuda 12.4
Environment:
I encounter an error when I execute:
Warring and error message:
This problem or question might seem kind of stupid since I'm a beginner in CUDA and C. I would appreciate it if anyone could provide me with some solutions or suggestions.