snap-stanford / ogb

Benchmark datasets, data loaders, and evaluators for graph machine learning
https://ogb.stanford.edu
MIT License
1.89k stars 398 forks source link

GPU utilization rate #153

Closed DesmonDay closed 3 years ago

DesmonDay commented 3 years ago

I run ogb.lsc.ComplEx_roberta example on 4x16G machine. However, the GPU utilization rate is 0% at most time, and the GPU occupation memory is 1300+MB per card. Is it a normal phenomenon? Thank you!

weihua916 commented 3 years ago

Hi! It seems to be related to loading roberta feature from the disk: https://github.com/snap-stanford/ogb/issues/131 Did you observe the same issue when you run ComplEx_shallow?

DesmonDay commented 3 years ago

Yes. Actually I met a strange phenomenon. I run ogb.lsc.ComplEx_shallow on another machine(8x16G). After loading the model and print the arguments, all the process seem stopped. All GPU utilization rate is 0% , all GPU occupation memory is 1351MB, and even the CPU% is 0...

Do you think I should replace entity_feat to all_entity_feat?

hyren commented 3 years ago

Hi, can you check whether the training freezes? Also are you running the example scripts?

DesmonDay commented 3 years ago

Yes, the training freezes. And I'm exactly running the example scripts run.sh. And I don't know why... I have now replaced dataset.entity_feat to dataset.all_entity_feat and run ComplEx_concat to see the results.

hyren commented 3 years ago

What is the size of the memory of your machine? Have you seen a line "Model created, it takes ... seconds" printed?

DesmonDay commented 3 years ago

Yes, and it freezes after printing the model arguments. This is ComplEx_concat example, and the memory of the machine is 400G+. I don't know why it freezes. No error messages, just hang.

屏幕快照 2021-04-14 下午2 09 02 屏幕快照 2021-04-14 下午2 08 04
hyren commented 3 years ago

This is really weird, it runs smoothly on my machine. Can you send me an email at hyren@cs.stanford.edu, I can learn more details about your case and help you fix this issue.

DesmonDay commented 3 years ago

Ok.