karpathy / llm.c

LLM training in simple, raw C/CUDA
MIT License
24.43k stars 2.76k forks source link

Error: make: *** [Makefile:203: train_gpt2cu] Error 255 #359

Open yushengsu-thu opened 6 months ago

yushengsu-thu commented 6 months ago

Environment:

I encounter an error when I execute:

make train_gpt2cu

Warring and error message:

---------------------------------------------
→ cuDNN is manually disabled by default, run make with `USE_CUDNN=1` to try to enable
✓ OpenMP found
✓ OpenMPI found, OK to train with multiple GPUs
✓ nvcc found, including GPU/CUDA support
---------------------------------------------
/lustre/apps/apps/cuda/cuda-12.1/bin/nvcc -O3 -t=0 --use_fast_math -DMULTI_GPU -DENABLE_BF16 train_gpt2.cu -lcublas -lcublasLt -L/usr/lib/x86_64-linux-gnu/openmpi/lib/ -I/usr/lib/x86_64-linux-gnu/openmpi/include -lmpi -lnccl -o train_gpt2cu
train_gpt2.cu(284): warning #20012-D: __device__ annotation is ignored on a function("Packed128") that is explicitly defaulted on its first declaration
      __attribute__((device)) Packed128() = default;
                     ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

train_gpt2.cu(201): error: identifier "__ushort_as_bfloat16" is undefined
      __nv_bfloat162 add_val = (ptr_val & 0x3) ? __halves2bfloat162(__ushort_as_bfloat16(0), val)
                                                                    ^

train_gpt2.cu(201): error: identifier "__halves2bfloat162" is undefined
      __nv_bfloat162 add_val = (ptr_val & 0x3) ? __halves2bfloat162(__ushort_as_bfloat16(0), val)
                                                 ^

train_gpt2.cu(203): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (__nv_bfloat162 *, __nv_bfloat162)
      atomicAdd(ptr_bf16, add_val);
      ^

train_gpt2.cu(242): error: no operator "+=" matches these operands
            operand types are: floatX += float
          val += __shfl_xor_sync(0xFFFFFFFF, val, offset);
              ^

train_gpt2.cu(284): warning #20012-D: __device__ annotation is ignored on a function("Packed128") that is explicitly defaulted on its first declaration
      __attribute__((device)) Packed128() = default;
                     ^

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

train_gpt2.cu(608): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(mean + idx, (floatX)m);
          ^

train_gpt2.cu(620): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(rstd + idx, (floatX)s);
          ^

train_gpt2.cu(629): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          float n = s * ((float)__ldcs(x+c) - m);
                                ^

train_gpt2.cu(630): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(o+c, (floatX)(n * (float)weight[c] + (float)bias[c]));
          ^

train_gpt2.cu(650): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          q[idx] = __ldcs(&inp[inp_idx]);
                   ^

train_gpt2.cu(651): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          k[idx] = __ldcs(&inp[inp_idx + NH * d]);
                   ^

train_gpt2.cu(652): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          v[idx] = __ldcs(&inp[inp_idx + 2 * (NH * d)]);
                   ^

train_gpt2.cu(688): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (floatX *)
          out[other_idx] = __ldcs(&inp[idx]);
                           ^

train_gpt2.cu(769): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
          float ev = expf(inv_temperature * ((float)__ldcs(x + i) - global_maxval));
                                                    ^

train_gpt2.cu(770): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
          __stcs(out + idx * T + i, (floatX)(ev * norm));
          ^

train_gpt2.cu(924): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float dout_i = (float)__ldcs(&dout_bt[i]);
                                    ^

train_gpt2.cu(925): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float norm_bti = ((float)__ldcs(&inp_bt[i]) - mean_bt) * rstd_bt;
                                       ^

train_gpt2.cu(996): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float acc = (float)__ldcs(att_bth + t3) * ((float)__ldcs(datt_bth + t3) - local_sum);
                                 ^

train_gpt2.cu(996): error: no instance of overloaded function "__ldcs" matches the argument list
            argument types are: (const floatX *)
              float acc = (float)__ldcs(att_bth + t3) * ((float)__ldcs(datt_bth + t3) - local_sum);
                                                                ^

train_gpt2.cu(997): error: no instance of overloaded function "__stcs" matches the argument list
            argument types are: (floatX *, floatX)
              __stcs(dpreatt_bth + t3, (floatX)(scale * acc));
              ^

train_gpt2.cu(1135): error: no operator "+=" matches these operands
            operand types are: floatX += floatX
      if (i < n) { dst[i] += (floatX)src[i]; }
                          ^

train_gpt2.cu(80): warning #177-D: variable "ncclFloatN" was declared but never referenced
  const ncclDataType_t ncclFloatN = ncclFloat;
                       ^

20 errors detected in the compilation of "train_gpt2.cu".
make: *** [Makefile:203: train_gpt2cu] Error 255

This problem or question might seem kind of stupid since I'm a beginner in CUDA and C. I would appreciate it if anyone could provide me with some solutions or suggestions.

ngc92 commented 6 months ago

you need to either disable BF16 (-DENABLE_BF16) or instruct your compiler to compile for a more recent GPU (Ampere) that actually has hardware support for bf16

ifromeast commented 6 months ago

similar ERROR

---------------------------------------------
→ cuDNN is manually disabled by default, run make with `USE_CUDNN=1` to try to enable
✓ OpenMP found
✓ OpenMPI found, OK to train with multiple GPUs
✓ nvcc found, including GPU/CUDA support
---------------------------------------------
/usr/local/cuda/bin/nvcc -O3 -t=0 --use_fast_math -DMULTI_GPU -DENABLE_FP16 train_gpt2.cu -lcublas -lcublasLt -L/usr/lib/x86_64-linux-gnu/openmpi/lib/  -I/usr/lib/x86_64-linux-gnu/openmpi/include  -lmpi -lnccl -o train_gpt2cu 
train_gpt2.cu(215): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (half2 *, half2)

train_gpt2.cu(242): error: no operator "+=" matches these operands
            operand types are: floatX += __half

train_gpt2.cu(284): warning #20012-D: __device__ annotation is ignored on a function("Packed128") that is explicitly defaulted on its first declaration

train_gpt2.cu(1135): error: no operator "+=" matches these operands
            operand types are: floatX += floatX

train_gpt2.cu(80): warning #177-D: variable "ncclFloatN" was declared but never referenced

3 errors detected in the compilation of "train_gpt2.cu".
make: *** [Makefile:203: train_gpt2cu] Error 255
rosslwheeler commented 6 months ago

Try upgrading your Cuda version to 12.4.1?

lancerts commented 6 months ago
  1. upgrade nvcc to 12.4.
  2. check the computation capability of the GPU card, in the source code include/cuda_bf16.h (or hpp). You might see
    #if defined(__CUDACC__) && (!defined(__CUDA_ARCH__) || (__CUDA_ARCH__ >= 800) || defined(_NVHPC_CUDA))

    This basically means functions are not available for computation capability <8.0..

Note, the header source is dependent on cuda tool kit version. Things that cannot be compiled in 12.1 may be compilable in 12.4 (this is the case for me).

yanqd0 commented 6 months ago

By default, PRECISION=BF16.

make
# It is the same as:
PRECISION=BF16 make

Compile with other options can also solve this issue.

PRECISION=FP16 make
# or
PRECISION=FP32 make

Related code in Makefile:

# Precision settings, default to bf16 but ability to override
PRECISION ?= BF16
VALID_PRECISIONS := FP32 FP16 BF16
ifeq ($(filter $(PRECISION),$(VALID_PRECISIONS)),)
  $(error Invalid precision $(PRECISION), valid precisions are $(VALID_PRECISIONS))
endif
ifeq ($(PRECISION), FP32)
  PFLAGS = -DENABLE_FP32
else ifeq ($(PRECISION), FP16)                                                                                                                                               
  PFLAGS = -DENABLE_FP16
else
  PFLAGS = -DENABLE_BF16
endif
jacobrast commented 5 months ago
  1. upgrade nvcc to 12.4.

    1. check the computation capability of the GPU card, in the source code include/cuda_bf16.h (or hpp). You might see
#if defined(__CUDACC__) && (!defined(__CUDA_ARCH__) || (__CUDA_ARCH__ >= 800) || defined(_NVHPC_CUDA))

This basically means functions are not available for computation capability <8.0..

Note, the header source is dependent on cuda tool kit version. Things that cannot be compiled in 12.1 may be compilable in 12.4 (this is the case for me).

This solved my issue when I saw the error on a V100 GPU (AWS P3 instance). Updating to CUDA 12.5 fixed the make error.

drzsdrtfg commented 2 months ago

Got it with cuda 12.4