jasperzhong / read-papers-and-code

My paper/code reading notes in Chinese
44 stars 3 forks source link

KDD '17 | metapath2vec: Scalable Representation Learning for Heterogeneous Networks #330

Closed jasperzhong closed 1 year ago

jasperzhong commented 1 year ago

https://dl.acm.org/doi/pdf/10.1145/3097983.3098036

jasperzhong commented 1 year ago

Heterogeneous network representataion learning: 无论何种类型的type of node,最后都是一个同样长度的node embedding. $\mathbf{X} \in \mathbb{R}^{|V| \times d}$.

之前的DeepWalk提出了如何在homogeneous图上进行node embedding学习,是进行random walk,然后将生成的walks送到skip-gram model里面学习.

metapath2vec也是一样的,只不过是必须沿着metapath去走. 由于很多metapath首尾的node type是一样的,所以其实是可以走很远的.

metapath2vec++是仿照之前word2vec提出了negative sampling,

image

jasperzhong commented 1 year ago

实验给的这个node embedding visualization挺impressive.

image

jasperzhong commented 1 year ago

metapath2vec的sampling方法就是random walk,对于一个node v,从v出发,只能沿着metapath走, 每次选择的邻居的概率都相等,走给定步数,得到一个walk序列.

DGL的metapath2vec实现: 使用dgl.sampling.random_walk,可以指定metapath. 每个node有num_walks_per_node个walks,然后每个walk有walk_length这么长. https://github.com/dmlc/dgl/blob/master/examples/pytorch/metapath2vec/sampler.py#L98

metapath是一个list of str, str是异构图的link名字.

hg = dgl.heterograph(
    {
        ("paper", "pa", "author"): (paper_author_src, paper_author_dst),
        ("author", "ap", "paper"): (paper_author_dst, paper_author_src),
        ("paper", "pc", "conf"): (paper_conf_src, paper_conf_dst),
        ("conf", "cp", "paper"): (paper_conf_dst, paper_conf_src),
    }
)
for conf_idx in tqdm.trange(hg.num_nodes("conf")):
    traces, _ = dgl.sampling.random_walk(
        hg,
        [conf_idx] * num_walks_per_node,
        metapath=["cp", "pa", "ap", "pc"] * walk_length,
    )