wujcan / SGL-TensorFlow

173 stars 42 forks source link

Is it normal for the training speed to be slow? #14

Closed KevinChow666 closed 2 years ago

KevinChow666 commented 2 years ago

Something strange happened when I ran the codes. I ran it with yelp2018 data set, but each epoch took about 15 minutes, while my GPU did not have a lot of work. Is it normal for the training speed to be slow? Or is there something that I have missed?
PS: I've used cython and cpp implements for evaluate.

wujcan commented 2 years ago

It seems abnormal. I run the yelp2018 dataset on a PC with an i7 CPU and Titan-RTX GPU, which takes about 90s/epoch. The GPU memory cost may be low, but its utilization is high during training. I doubt that the CPU limits the training speed since data augment is performed in CPU. If you run SGL-ED or SGL-ND, you can modify the code by only feeding the subgraph adjacent matrix of the first GCN layer since the subgraphs are identical across layers. This will reduce the amount of data that needs to be transferred from CPU to GPU, which is a bottleneck of GPU training.

KevinChow666 commented 2 years ago

Thank you for your reply! I will give it a try.