tangjianpku / LINE

LINE: Large-scale information network embedding
1.05k stars 408 forks source link

Embeddings of all nodes are not obtained #11

Open ayushidalmia opened 7 years ago

ayushidalmia commented 7 years ago

Hi, I was trying to run this on a graph. However, the embeddings vec_1st.txt, vec_2nd.txt, and vec_all.txt do not generate the embeddings of all the nodes as in the original input graph.

Can you tell where I might be going wrong or why is this behavior caused?

Cheng-CZ commented 7 years ago

Same issue with me. Some nodes are missing.

zhujiangang commented 7 years ago

If you have read the code, you may find that the training instances are sampled from the graph, so the edges of low degree vertices won't be sampled in the training stage. This is the reason that some nodes are missing in the final embedding result.

mongooma commented 7 years ago

@zhujiangang The embeddings are initialized at first in InitVector() so even some edges are not sampled the nodes still have embeddings. I didn't have this issue in my case of using LINE. I wonder what caused your problem.

jiay302 commented 7 years ago

@gooeyforms Could you help me use this LINE model, I hava met some problems: I followed the train_youtube command, set the binary parameter 0, but the first column of the result file appear float nums, differ from the origin vertex id. I am very confused.

mongooma commented 7 years ago

I do had problems when I set binary to 0. So currently I'm setting binary to 1, then read the binary file and output to readable text (using Python, which I'm more familiar with). This is just makeshift. Please let me know if you figure this out.

2017年9月15日 09:44,jiay302 notifications@github.com写道:

@gooeyformshttps://github.com/gooeyforms Could you help me use this LINE model, I hava met some problems: I followed the train_youtube command, set the binary parameter 0, but the first column of the result file appear float nums, differ from the origin vertex id. I am very confused.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/tangjianpku/LINE/issues/11#issuecomment-329654493, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ADXP2vMsNKYBdVb_kG5mojZ9j1HOcf27ks5sidZ-gaJpZM4NstcM.

jiay302 commented 7 years ago

@gooeyforms Can you provide your email to me? I have some questions to ask you. I am a student at Beijing University of Posts and Telecommunications. I am looking forward to getting your help. Thank you very much.

yyr93520 commented 7 years ago

Have you solved it? I met the same problem. Thanks!

pickou commented 6 years ago

I have met the same problem, while I use the data of BlogCatalog, the embedding should be 10312,but line only returns a number of 10263.

mongooma commented 6 years ago

@pickou Could you run the code again with binary -1, and count the lines in the binary embedding file, like using wc -l *.embedding? I have run LINE on the BlogCatalog dataset with binary -1 and this issue didn't occur. But I'm having trouble with binary -0.

pickou commented 6 years ago

In my case, the issue occur the same. when I use wc -l line.emb ,I got 27420 and 27501 in two runs of LINE with the same parameters.

pickou commented 6 years ago

@mongooma have you change the graph as undirected one ? I have made the change, like this.

1 2
3 5

then,

1 2
2 1
3 5
5 3
mongooma commented 6 years ago

@pickou I did. I don't know what caused the issue. I suggest you set breakpoints or print lines to debug the code. Please let me know when you locate the problem.

pickou commented 6 years ago

@mongooma I have found what caused the issue. when you read the edges from file, you must give a weight.

fscanf(fin, "%s %s %lf", name_v1, name_v2, &weight);

see,

1 2
3 5

then,

1 2 1
2 1 1
3 5 1
5 3 1

You'd better to warn people of that or you can set a parameter, like weighted, and deal with weighted and unweighted graph.

mongooma commented 6 years ago

@pickou I'm glad you located the problem. However, I still don't understand why this would cause the random result with different runs as you described. And even that I think the original input format is explicit enough for all types of graphs, I definitely think a separate script to deal with different input formats is a good idea. At this point, you could commit a pull request to add a warning line to the Readme file.

pickou commented 6 years ago

@mongooma I don't know either, but I have followed the ReadData() function, when I use the unweighted graph as input, like

1 2
2 1
3 1
1 3

and I print the name_v1 and name_v2, Sometimes I got "1\100\066" instead of "1". I think the issue came from here.

ZiyaoWu commented 5 years ago

I suppose the reason why nodes miss is that the degree of missing nodes is zero