Open Joeyzhouqihui opened 2 years ago
We will test the sampling scalability on our machine. We suspect it could be some contention of shared resource.
Well, this is strange, it should not happen because they are doing on different devices as you said. But we will look into this problem and give you feedback ASAP.
Thank you so much and look forward to your findings!
Hi, sorry for bothering you. I am a little wondering if there is any profile results or findings?
When we use multiple gpus to do sampling with quiver in the mode of gpu sampling(graph stored in gpu memory), we found that the scalability is poor.
To be specific, we run the example code on reddit and the sampling cost is about 1.11s when using 1 gpu. We expect the time cost of sampling using 8 gpus to be about 8x lower since all gpus do sampling independently. However, when 8 gpus are used, the sampling cost is 0.79s, which is much higher than we have expected. In addition, when using 4 gpus, the sampling cost is 0.66s which is lower than the case of 8 gpus.
Could you please give us some insight or explanation about this phenomenon? Thank you so much!