xgfs / verse

Reference implementation of the paper VERSE: Versatile Graph Embeddings from Similarity Measures
http://tsitsul.in/publications/verse/
MIT License
128 stars 22 forks source link

How to read the output binary file in Python? #1

Open pbamotra opened 6 years ago

pbamotra commented 6 years ago

Hi Authors,

Can you please let me know how to read the output binary file as a matrix of |vocab| x |dim| size or in some other consumable fashion? How do I get the vocabulary?

Pankesh

xgfs commented 6 years ago

Dear Pankesh,

The vocabulary is assumed to be [0..n-1] integers, the user is supposed to convert the graph to the matrix format themselves.

As for the output binary file, it is just a binary matrix of floats, you can read it it python with

np.fromfile('embedding.bin', np.float32).reshape(num_nodes, embedding_dim)

Hope that helps. Anton

adityasundaram commented 5 years ago

Is it required for the vocab to be a consecutive [0..n-1] integers? Could the vocab contain [0..n-1] with integers missing in between or start from a diff range [m..n]?

xgfs commented 5 years ago

C++ program takes a binary CSR file as input, and produces embeddings for every row of this matrix, simply speaking. So yes, vocab (as in bcsr file) must be consecutive [0..n) integers for the program to operate as expected. However, I provide the utility that converts files in different formats, including non-standard vocabulary graphs, to bcsr.

adityasundaram commented 5 years ago

Got it, thank you for clarifying