snap-stanford / snap

Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining library.
Other
2.18k stars 799 forks source link

What is the first row of node2vec `.emb` output with only 2 values? #192

Open dhimmel opened 4 years ago

dhimmel commented 4 years ago

If we look at the head of the example node2vec embedding output at karate.emb:

https://github.com/snap-stanford/snap/blob/907c34aac6bcddc7c2f8efb64be76e87dd7e4ea5/examples/node2vec/emb/karate.emb#L1-L5

It looks like from row 2 onwards the output is node_id + embeddings...

But I can't figure out what the first row is.

dhimmel commented 4 years ago

I think I figured it out. The first row provides the number of connected nodes (35) and the number of embedding features (24).

Here's some pandas code to read this format:

import pandas as pd
path = "https://github.com/snap-stanford/snap/raw/907c34aac6bcddc7c2f8efb64be76e87dd7e4ea5/examples/node2vec/emb/karate.emb"
embedding_df = (
    pd.read_csv(path, sep=" ", skiprows=1, index_col=0, header=None)
    .rename_axis(index="node_id")
    .add_prefix("emb_")
)
embedding_df

image