RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

AustinZzx commented 8 months ago

Encountered this error when running step 9 (calculate 3DMM) of data_gen/nerf/process_data.sh. Command: python data_gen/nerf/extract_3dmm.py --video_id=May Log:

Traceback (most recent call last):
  File "/home/zexia/GeneFace/data_gen/nerf/extract_3dmm.py", line 56, in process_video
    lm68 = fa.get_landmarks(frames[i])[0] # 识别图片中的人脸，获得角点, shape=[68,2]
  File "/home/zexia/miniconda3/envs/geneface/lib/python3.9/site-packages/face_alignment/api.py", line 113, in get_landmarks
    return self.get_landmarks_from_image(image_or_path, detected_faces, return_bboxes, return_landmark_score)
  File "/home/zexia/miniconda3/envs/geneface/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/zexia/miniconda3/envs/geneface/lib/python3.9/site-packages/face_alignment/api.py", line 168, in get_landmarks_from_image
    out = self.face_alignment_net(inp).detach()
  File "/home/zexia/miniconda3/envs/geneface/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

#define NAN __int_as_float(0x7fffffff)
#define POS_INFINITY __int_as_float(0x7f800000)
#define NEG_INFINITY __int_as_float(0xff800000)

template<typename T>
__device__ T maximum(T a, T b) {
  return isnan(a) ? a : (a > b ? a : b);
}

template<typename T>
__device__ T minimum(T a, T b) {
  return isnan(a) ? a : (a < b ? a : b);
}

extern "C" __global__
void fused_cat_cat(float* tinput0_42, float* tinput0_46, float* tout3_67, float* tinput0_60, float* tinput0_52, float* tout3_71, float* aten_cat_1, float* aten_cat) {
{
if (blockIdx.x<512ll ? 1 : 0) {
    aten_cat[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = ((((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 1024ll<192ll ? 1 : 0) ? ((((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 1024ll<128ll ? 1 : 0) ? __ldg(tinput0_60 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) : __ldg(tinput0_52 + ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) - 131072ll)) : __ldg(tout3_71 + ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) - 196608ll));
  }  aten_cat_1[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = ((((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 4096ll<192ll ? 1 : 0) ? ((((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) / 4096ll<128ll ? 1 : 0) ? __ldg(tinput0_42 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) : __ldg(tinput0_46 + ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) - 524288ll)) : __ldg(tout3_67 + ((long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)) - 786432ll));
}
}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zexia/GeneFace/data_gen/nerf/extract_3dmm.py", line 112, in <module>
    process_video(video_fname, out_fname, skip_tmp=False)
  File "/home/zexia/GeneFace/data_gen/nerf/extract_3dmm.py", line 59, in process_video
    raise ValueError("")
ValueError

AustinZzx commented 8 months ago

Using CUDA 11.3. Verified in python that pytorch and tensorflow can access GPU.

AustinZzx commented 8 months ago

running on WSL2, GPU is RTX 4070 Ti and the driver version is 535.98.

jinqiupeter commented 8 months ago

You probably use Lovelace/Hopper GPUs like the RTX 40 series or H100/H800, they are not compatible with the CUDA version being used.

Try a previous generation GPU like RTX 3090.

AustinZzx commented 8 months ago

If this is the only GPU I have access to at this moment, is there a CUDA version I can use for this repo? I initially tried to install CUDA 12.3 (latest cuda), but then could not find a cudatoolkit=12.3 in conda, so used CUDA 11.3 instead. Do you think going back to CUDA 12.3 and pip install the cudatoolkit could help?

jinqiupeter commented 8 months ago

You can also try CUDA 11.7, which is also the CUDA version in the docker base image (see ./docker/dockerfile).

Another option is to run your code on a cloud GPU like runpod.io and choose RTX3090. This is just to make sure it's not a GPU compatibility issue.

AustinZzx commented 8 months ago

verified that using CUDA 11.8 and pytorch 2.x worked.

yerfor / GeneFace

RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch) #226