huggingface / blog

Public repo for HF blog posts
https://hf.co/blog
2.28k stars 710 forks source link

Question about graph-classification using GPU #1052

Open LyuLumos opened 1 year ago

LyuLumos commented 1 year ago

I followed the code of Graph Classification, I tried to run the code on an A100 80G, Intel(R) Xeon(R) Gold 5320 CPU, with CUDA 11.1.

datasets                 2.11.0
transformers             4.28.1
torch                    2.0.0

I have also installed Cython and apex. The code is running, however it is slow. I observed by command nvidia-smi that the code took up about 21G of GPU memory, but the Volatile GPU-Util was always close to 0 and the code was expected to run for 53 hours, which is very different from the documentation of training/fine-tuning for 20 epochs on CPU (IntelCore i7)

  0%|▏                                           | 2/1020 [06:28<52:59:29, 187.40s/it]

I have set model = model.cuda() and dataloader_num_workers=8. Why does it run so slow on GPU?What else should I do to speed up the code training?

Looking forward to your reply and thanks for the blog.

clefourrier commented 1 year ago

Hi @LyuLumos ! I'm in a rush at the moment, but I will investigate this at the end of the month !

daeyeoplee commented 1 year ago

Hi in my case it doesn't work at all if i send my model to cuda. How can i solve?

clefourrier commented 1 year ago

Hi @daeyeoplee ! Can you send me your logs?

daeyeoplee commented 1 year ago

@clefourrier

image

this error keep pops up, but if i use only one sample then it works fine so i don't know what's the problem

LyuLumos commented 1 year ago

@clefourrier image this error keep pops up, but if i use only one sample then it works fine so i don't know what's the problem

I have a method that may work.

Step 1. Load model to CPU to ensure there are no problems with your model. Step 2. Load your model onto just one GPU.

daeyeoplee commented 1 year ago

@clefourrier Thanks for your reply. Step1 is OK and it worked with batches of samples(not large batches cuz of memory and speed problem) Step2 didn't go well. I'm keep trying but now I got this result.

image

I used os.environ['cuda visible device']=1 & device=torch.device('cuda:0') to use just one GPU