tangjianpku / LINE

LINE: Large-scale information network embedding
1.05k stars 408 forks source link

A question about multi-thread in your code #35

Open xuxiaohan opened 5 years ago

xuxiaohan commented 5 years ago

Hi, I read your C++ code of LINE for Windos, a very good implementation. but I have a question that why you didn't consider the Read-Write Conflict when update the embeding vector in Update() function. all thread may read or write vec_v[c] ,or say, emb_vertex[c] and emb_context[c] at any time, are there any potential problem? for example, there are two thread that sampled two edge, which linked to common vertex, and then they will update the embedding features of common vertex, so that when run the code such as 307 line in your implementation : x += vec_u[c] * vec_v[c], what the two thread read are not promised by boost::thread. in such case, the embedding features of the vertex will be Corrupted. I know the probability of the mistake is very very small when the number of vertex is 1e10+ and the number of thread is just 10+

I am looking forward your response and thank you very much.

m-ochi commented 5 years ago

Hi. You can see the reason in the original LINE paper. In the SGD method, We don't need to care about conflicts in the parallelizing setting. The theoretical reason and experimental results are written in the following paper. https://papers.nips.cc/paper/4390-hogwild-a-lock-free-approach-to-parallelizing-stochastic-gradient-descent

xuxiaohan commented 5 years ago

Thanks for your answer.