Open wfoy opened 6 months ago
Can you add which GPU device and cudnn version?
A log with CUDNN_LOGLEVEL_DBG=3
will be useful for debug as well.
https://docs.nvidia.com/deeplearning/cudnn/latest/reference/troubleshooting.html
Fixed by upgrading cuDNN version, previously was on 8.9.2 which broke with above error
After compiling by make train_gpt2cu USE_CUDNN=1
, and run ./train_gpt2cu
, there is an ERROR that
+-----------------------+----------------------------------------------------+
| Parameter | Value |
+-----------------------+----------------------------------------------------+
| input dataset prefix | data/tiny_shakespeare |
| output log file | NULL |
| batch size B | 4 |
| sequence length T | 1024 |
| learning rate | 3.000000e-04 |
| max_steps | -1 |
| val_loss_every | 20 |
| val_max_batches | 20 |
| sample_every | 20 |
| genT | 64 |
| overfit_single_batch | 0 |
| use_master_weights | enabled |
+-----------------------+----------------------------------------------------+
| device | NVIDIA GeForce RTX 4090 |
| precision | BF16 |
+-----------------------+----------------------------------------------------+
| load_filename | gpt2_124M_bf16.bin |
| max_sequence_length T | 1024 |
| vocab_size V | 50257 |
| padded_vocab_size Vp | 50304 |
| num_layers L | 12 |
| num_heads NH | 12 |
| channels C | 768 |
| num_parameters | 124475904 |
+-----------------------+----------------------------------------------------+
| train_num_batches | 74 |
| val_num_batches | 20 |
+-----------------------+----------------------------------------------------+
| num_processes | 1 |
+-----------------------+----------------------------------------------------+
num_parameters: 124475904 ==> bytes: 248951808
allocated 237 MiB for model parameters
allocated 1703 MiB for activations
[CUDNN ERROR] at file cudnn_att.cpp:141:
[cudnn_frontend] Error: No execution plans built successfully.
and my CUDA is 12.4, cuDNN is 9.1, cudnn-frontend is 1.4.0 on Ubuntu 22.04
Hi @ifromeast
Is it possible for you to dump the cudnn log?
If you set export CUDNN_LOGLEVEL_DBG=3
and it will dumped to your stdout.
The log will look like something like this:
I! CuDNN (v90100 70) function cudnnCreate() called:
i! handle: location=host; addr=0x563c9b4a01a0;
i! Time: 2024-05-13T07:49:20.051230 (0d+0h+0m+0s since start)
i! Process=975; Thread=975; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v90100 70) function cudnnGraphLibraryConfigInit() called:
i! apiLog: type=cudnnLibConfig_t; val=CUDNN_STANDARD;
i! Time: 2024-05-13T07:49:20.051266 (0d+0h+0m+0s since start)
i! Process=975; Thread=975; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v90100 70) function cudnnGetVersion() called:
i! Time: 2024-05-13T07:49:20.216976 (0d+0h+0m+0s since start)
i! Process=975; Thread=975; GPU=NULL; Handle=NULL; StreamId=NULL.
I am able to run the exact some configuration locally.
+-----------------------+----------------------------------------------------+
| Parameter | Value |
+-----------------------+----------------------------------------------------+
| input dataset prefix | data/tiny_shakespeare |
| output log file | NULL |
| batch size B | 4 |
| sequence length T | 1024 |
| learning rate | 3.000000e-04 |
| max_steps | -1 |
| val_loss_every | 20 |
| val_max_batches | 20 |
| sample_every | 20 |
| genT | 64 |
| overfit_single_batch | 0 |
| use_master_weights | enabled |
+-----------------------+----------------------------------------------------+
| device | NVIDIA GeForce RTX 4090 |
| precision | BF16 |
+-----------------------+----------------------------------------------------+
| load_filename | gpt2_124M_bf16.bin |
| max_sequence_length T | 1024 |
| vocab_size V | 50257 |
| padded_vocab_size Vp | 50304 |
| num_layers L | 12 |
| num_heads NH | 12 |
| channels C | 768 |
| num_parameters | 124475904 |
+-----------------------+----------------------------------------------------+
| train_num_batches | 74 |
| val_num_batches | 20 |
+-----------------------+----------------------------------------------------+
| num_processes | 1 |
+-----------------------+----------------------------------------------------+
num_parameters: 124475904 ==> bytes: 248951808
allocated 237 MiB for model parameters
allocated 1703 MiB for activations
val loss 4.505090
allocated 237 MiB for parameter gradients
allocated 30 MiB for activation gradients
allocated 474 MiB for AdamW optimizer state m
allocated 474 MiB for AdamW optimizer state v
allocated 474 MiB for master copy of params
step 1/74: train loss 4.370480 (acc 4.370480) (298.699646 ms, 13712.771484 tok/s)
step 2/74: train loss 4.502850 (acc 4.502850) (34.138111 ms, 119983.187500 tok/s)
step 3/74: train loss 4.414629 (acc 4.414629) (34.011135 ms, 120212.890625 tok/s)
step 4/74: train loss 3.958204 (acc 3.958204) (34.105343 ms, 120172.781250 tok/s)
step 5/74: train loss 3.607100 (acc 3.607100) (34.020351 ms, 120233.632812 tok/s)
step 6/74: train loss 3.782271 (acc 3.782271) (34.085888 ms, 120218.898438 tok/s)
Hi @Anerudhan , Thank you so much for your advice to print log, and I got
E! CuDNN (v90101 17) function cudnnBackendFinalize() called:
e! Info: Traceback contains 4 message(s)
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: rtc->loadModule()
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: ptr.isSupported()
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: engine_post_checks(*engine_iface, engine.getPerfKnobs(), req_size, engine.getTargetSMCount())
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: finalize_internal()
e! Time: 2024-05-13T16:09:12.824746 (0d+0h+0m+2s since start)
e! Process=781621; Thread=781621; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v90101 17) function cudnnGetErrorString() called:
i! status: type=int; val=5000;
i! Time: 2024-05-13T16:09:12.824882 (0d+0h+0m+2s since start)
i! Process=781621; Thread=781621; GPU=NULL; Handle=NULL; StreamId=NULL.
Do you know why it happens? I am new to CUDA, thank you so much!
Could be a driver or toolkit issue. What version of driver are you on?
nvidia-smi
Mon May 13 08:31:45 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
Update instructions: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network
Could be a driver or toolkit issue. What version of driver are you on?
nvidia-smi Mon May 13 08:31:45 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+
Update instructions: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network
this is my driver version
Mon May 13 16:38:13 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:00:08.0 Off | Off |
| 30% 28C P8 22W / 450W | 11MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 Off | 00000000:00:09.0 Off | Off |
| 30% 27C P8 18W / 450W | 11MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
@Anerudhan cudnn-frontend have updated last week, have you update it?
similarly, the ERROR occurs when
(llm-env) root@ubuntu22:~/llm.c/dev/cuda# nvcc -I../../cudnn-frontend/include -DENABLE_CUDNN -O3 --use_fast_math -lcublas -lcublasLt -lcudnn attention_forward.cu -o attention_forwa
rd
(llm-env) root@ubuntu22:~/llm.c/dev/cuda# ./attention_forward 10
enable_tf32: 1
Using kernel 10
Checking block size 32.
attention_forward: attention_forward.cu:1143: auto lookup_cache_or_build_graph_fwd(Args ...) [with Args = {int, int, int, int, bool}]: Assertion `graph->check_support(cudnn_handle).is_good()' failed.
is there anything wrong with my cuDNN or cudnn-frontend?
Hi @ifromeast, I am still trying to reproduce the issue (Yes I have the latest cudnn-frontend and cudnn).
This does not look like a cudnn issue. I suspect this happens because of multi-GPU setup.
Is it possible for you to try two scenarios:
a) Try setting CUDA_VISIBLE_DEVICES=0,-1,1
and check if the execution is successful for you?
b) (Indpendent of case above)Try setting CUDA_MODULE_LOADING=EAGER
and CUDA_MODULE_DATA_LOADING=EAGER
Thanks Anerudhan
I am having exact issue as @ifromeast. My cuda version is 12.4
, CuDNN Version is 9.1.1.17-1
, and cudnn-frontend is 1.4.0
on Debian 11.
Hi @simonguozirui , Is it on multi-GPU 4090 as well?
Is it possible for you to try two scenarios: a) Try setting CUDA_VISIBLE_DEVICES=0,-1,1 and check if the execution is successful for you?
b) (Indpendent of case above)Try setting CUDA_MODULE_LOADING=EAGER and CUDA_MODULE_DATA_LOADING=EAGER
Thanks
Hey @Anerudhan! Thanks so much for the suggestion. I tried both of those, but unfortunately doesn't change the behavior. I am on a T4 GPU (single GPU setup). Things break for me at graph->check_support(cudnn_handle)
as well.
Curious which CuDNN and front end version are you using so I can reference and debug.
I am using cudnn-frontend-1.4.0 and cuda 12.4 (I have cuda 12.3 installed as well for debugging).
I think the issue is cudnn sdpa operation is not supported on T4 (turing and requires Ampere or later GPUs). If you run with export CUDNN_LOGLEVEL_DBG=2
, you will see more helpful messages.
Thanks
@Anerudhan thanks I will try on an Ampere GPU too.
With the new log level I see some messages like
i! descriptor: type=CUDNN_BACKEND_ENGINEHEUR_DESCRIPTOR; val=NOT_IMPLEMENTED;
curious if you know what might be causing that.
Those are info messages (i!) and harmless as they capture the library state. I would be more interested in messages which are warnings(w!) or errors(e!).
Hi @Anerudhan, I checked. No errors e!
, only one w!
; here it is
W! CuDNN (v90101 17) function cudnnBackendFinalize() called:
w! Info: Traceback contains 2 message(s)
w! Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: userGraph->getEntranceNodesSize() != 2
w! Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: numUserNodes != 5 && numUserNodes != 6
w! Time: 2024-05-16T18:12:27.288935 (0d+0h+0m+0s since start)
w! Process=349188; Thread=349188; GPU=NULL; Handle=NULL; StreamId=NULL.
same error with @ifromeast, btw I tested it on wsl.
(base) h53@Nyx:~/repo/llm.c$ make train_gpt2cu USE_CUDNN=1
---------------------------------------------
✓ cuDNN found, will run with flash-attention
✓ OpenMP found
✗ OpenMPI is not found, disabling multi-GPU support
---> On Linux you can try install OpenMPI with `sudo apt install openmpi-bin openmpi-doc libopenmpi-dev`
✓ nvcc found, including GPU/CUDA support
---------------------------------------------
/usr/local/cuda-12.3/bin/nvcc -O3 -t=0 --use_fast_math --generate-code arch=compute_89,code=[compute_89,sm_89] -DENABLE_CUDNN -DENABLE_BF16 train_gpt2.cu cudnn_att.o -lcublas -lcublasLt -lcudnn -I/home/h53/cudnn-frontend/include -o train_gpt2cu
(base) h53@Nyx:~/repo/llm.c$ ./train_gpt2cu
Multi-GPU support is disabled. Using a single GPU.
+-----------------------+----------------------------------------------------+
| Parameter | Value |
+-----------------------+----------------------------------------------------+
| train data pattern | dev/data/tinyshakespeare/tiny_shakespeare_train.bin |
| val data pattern | dev/data/tinyshakespeare/tiny_shakespeare_val.bin |
| output log dir | NULL |
| checkpoint_every | 0 |
| resume | 0 |
| micro batch size B | 4 |
| sequence length T | 1024 |
| total batch size | 4096 |
| learning rate (LR) | 3.000000e-04 |
| warmup iterations | 0 |
| final LR fraction | 1.000000e+00 |
| weight decay | 0.000000e+00 |
| grad_clip | 1.000000e+00 |
| max_steps | -1 |
| val_loss_every | 20 |
| val_max_steps | 20 |
| sample_every | 20 |
| genT | 64 |
| overfit_single_batch | 0 |
| use_master_weights | enabled |
| recompute | 1 |
+-----------------------+----------------------------------------------------+
| device | NVIDIA GeForce RTX 4060 Ti |
| precision | BF16 |
+-----------------------+----------------------------------------------------+
| load_filename | gpt2_124M_bf16.bin |
| max_sequence_length T | 1024 |
| vocab_size V | 50257 |
| padded_vocab_size Vp | 50304 |
| num_layers L | 12 |
| num_heads NH | 12 |
| channels C | 768 |
| num_parameters | 124475904 |
+-----------------------+----------------------------------------------------+
| train_num_batches | 74 |
| val_num_batches | 20 |
+-----------------------+----------------------------------------------------+
| run hellaswag | no |
+-----------------------+----------------------------------------------------+
| Zero Optimization is disabled |
| num_processes | 1 |
| zero_stage | 0 |
+-----------------------+----------------------------------------------------+
HellaSwag eval not found at dev/data/hellaswag/hellaswag_val.bin, skipping its evaluation
You can run `python dev/data/hellaswag.py` to export and use it with `-h 1`.
num_parameters: 124475904 => bytes: 248951808
allocated 237 MiB for model parameters
batch_size B=4 * seq_len T=1024 * num_processes=1 and total_batch_size=4096
=> setting grad_accum_steps=1
allocating 1439 MiB for activations
W! CuDNN (v90101 17) function cudnnBackendFinalize() called:
w! Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: userGraph->getEntranceNodesSize() != 2
w! Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: numUserNodes != 5 && numUserNodes != 6
w! Time: 2024-05-28T07:01:07.380708 (0d+0h+0m+0s since start)
w! Process=210452; Thread=210452; GPU=NULL; Handle=NULL; StreamId=NULL.
E! CuDNN (v90101 17) function cudnnBackendFinalize() called:
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: rtc->loadModule()
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: ptr.isSupported()
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: engine_post_checks(*engine_iface, engine.getPerfKnobs(), req_size, engine.getTargetSMCount())
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: finalize_internal()
e! Time: 2024-05-28T07:01:07.741508 (0d+0h+0m+0s since start)
e! Process=210452; Thread=210452; GPU=NULL; Handle=NULL; StreamId=NULL.
E! CuDNN (v90101 17) function cudnnBackendFinalize() called:
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: rtc->loadModule()
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: ptr.isSupported()
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: engine_post_checks(*engine_iface, engine.getPerfKnobs(), req_size, engine.getTargetSMCount())
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: finalize_internal()
e! Time: 2024-05-28T07:01:07.960496 (0d+0h+0m+0s since start)
e! Process=210452; Thread=210452; GPU=NULL; Handle=NULL; StreamId=NULL.
E! CuDNN (v90101 17) function cudnnBackendFinalize() called:
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: rtc->loadModule()
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: ptr.isSupported()
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: engine_post_checks(*engine_iface, engine.getPerfKnobs(), req_size, engine.getTargetSMCount())
e! Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: finalize_internal()
e! Time: 2024-05-28T07:01:08.192814 (0d+0h+0m+1s since start)
e! Process=210452; Thread=210452; GPU=NULL; Handle=NULL; StreamId=NULL.
[CUDNN ERROR] at file cudnn_att.cpp:141:
[cudnn_frontend] Error: No execution plans built successfully.
(base) h53@Nyx:~/repo/llm.c$ nvidia-smi
Tue May 28 07:02:59 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.65 Driver Version: 551.86 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4060 Ti On | 00000000:01:00.0 On | N/A |
| 0% 37C P8 8W / 165W | 1231MiB / 16380MiB | 3% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 33 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+
(base) h53@Nyx:~/repo/llm.c$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Nov__3_17:16:49_PDT_2023
Cuda compilation tools, release 12.3, V12.3.103
Build cuda_12.3.r12.3/compiler.33492891_0
I have similar error , running in ubuntu 22.04
(base) ubuntu:~/llm.c$ nvidia-smi
Fri Jun 7 06:23:37 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla V100-SXM2-16GB Off | 00000000:00:1E.0 Off | 0 |
| N/A 32C P0 23W / 300W | 1MiB / 16384MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
The full log is too long with CUDNN_LOGLEVEL_DBG=3
, last few lines are:
I! CuDNN (v90101 17) function cudnnBackendGetAttribute() called:
i! descriptor: type=CUDNN_BACKEND_ENGINEHEUR_DESCRIPTOR; val=NOT_IMPLEMENTED;
i! attributeName: type=cudnnBackendAttributeName_t; val=CUDNN_ATTR_ENGINEHEUR_RESULTS (202);
i! attributeType: type=cudnnBackendAttributeType_t; val=CUDNN_TYPE_BACKEND_DESCRIPTOR (15);
i! requestedElementCount: type=int64_t; val=0;
i! elementCount: location=host; addr=0x7ffcd475f490;
i! arrayOfElements: location=host; addr=NULL_PTR;
i! Time: 2024-06-07T06:07:28.129271 (0d+0h+0m+3s since start)
i! Process=19625; Thread=19625; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v90101 17) function cudnnBackendGetAttribute() called:
i! descriptor: type=CUDNN_BACKEND_ENGINEHEUR_DESCRIPTOR; val=NOT_IMPLEMENTED;
i! attributeName: type=cudnnBackendAttributeName_t; val=CUDNN_ATTR_ENGINEHEUR_RESULTS (202);
i! attributeType: type=cudnnBackendAttributeType_t; val=CUDNN_TYPE_BACKEND_DESCRIPTOR (15);
i! requestedElementCount: type=int64_t; val=0;
i! elementCount: location=host; addr=0x7ffcd475f3d8;
i! arrayOfElements: location=host; addr=NULL_PTR;
i! Time: 2024-06-07T06:07:28.129400 (0d+0h+0m+3s since start)
i! Process=19625; Thread=19625; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v90101 17) function cudnnBackendDestroyDescriptor() called:
i! descriptor: type=CUDNN_BACKEND_ENGINEHEUR_DESCRIPTOR; val=NOT_IMPLEMENTED;
i! Time: 2024-06-07T06:07:28.129525 (0d+0h+0m+3s since start)
i! Process=19625; Thread=19625; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v90101 17) function cudnnGetErrorString() called:
i! status: type=int; val=0;
i! Time: 2024-06-07T06:07:28.129586 (0d+0h+0m+3s since start)
i! Process=19625; Thread=19625; GPU=NULL; Handle=NULL; StreamId=NULL.
[CUDNN ERROR] at file llmc/cudnn_att.cpp:112:
[cudnn_frontend] Error: No execution plans built successfully.
Fixed by upgrading cuDNN version, previously was on 8.9.2 which broke with above error
after upgrade to cuDNN 9.2.0 from 9.1.1, I got new error, which version are you using ? Thanks
I! CuDNN (v90101 17) function cudnnBackendGetAttribute() called:
i! descriptor: type=CUDNN_BACKEND_ENGINEHEUR_DESCRIPTOR; val=NOT_IMPLEMENTED;
i! attributeName: type=cudnnBackendAttributeName_t; val=CUDNN_ATTR_ENGINEHEUR_RESULTS (202);
i! attributeType: type=cudnnBackendAttributeType_t; val=CUDNN_TYPE_BACKEND_DESCRIPTOR (15);
i! requestedElementCount: type=int64_t; val=0;
i! elementCount: location=host; addr=0x7ffcd475f3d8;
i! arrayOfElements: location=host; addr=NULL_PTR;
i! Time: 2024-06-07T06:07:28.129400 (0d+0h+0m+3s since start)
i! Process=19625; Thread=19625; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v90101 17) function cudnnBackendDestroyDescriptor() called:
i! descriptor: type=CUDNN_BACKEND_ENGINEHEUR_DESCRIPTOR; val=NOT_IMPLEMENTED;
i! Time: 2024-06-07T06:07:28.129525 (0d+0h+0m+3s since start)
i! Process=19625; Thread=19625; GPU=NULL; Handle=NULL; StreamId=NULL.
I! CuDNN (v90101 17) function cudnnGetErrorString() called:
i! status: type=int; val=0;
i! Time: 2024-06-07T06:07:28.129586 (0d+0h+0m+3s since start)
i! Process=19625; Thread=19625; GPU=NULL; Handle=NULL; StreamId=NULL.
[CUDNN ERROR] at file llmc/cudnn_att.cpp:112:
[cudnn_frontend] Error: No execution plans built successfully.
@h53 @yangcheng I encountered the same issue. It only works on Ampere or later GPUs. Therefore, I switched from a V100 to an A100.
I'm getting the following error when running
./train_gpt2cu
after building usingmake train_gpt2cu USE_CUDNN=1
I'm running CUDA 12.4 on Ubuntu 22.04 Any help or pointers would be great, thanks!