Open dhimmel opened 4 years ago
I think I figured it out. The first row provides the number of connected nodes (35
) and the number of embedding features (24
).
Here's some pandas code to read this format:
import pandas as pd
path = "https://github.com/snap-stanford/snap/raw/907c34aac6bcddc7c2f8efb64be76e87dd7e4ea5/examples/node2vec/emb/karate.emb"
embedding_df = (
pd.read_csv(path, sep=" ", skiprows=1, index_col=0, header=None)
.rename_axis(index="node_id")
.add_prefix("emb_")
)
embedding_df
If we look at the head of the example node2vec embedding output at
karate.emb
:https://github.com/snap-stanford/snap/blob/907c34aac6bcddc7c2f8efb64be76e87dd7e4ea5/examples/node2vec/emb/karate.emb#L1-L5
It looks like from row 2 onwards the output is node_id + embeddings...
But I can't figure out what the first row is.