Closed jasperzhong closed 1 year ago
Heterogeneous network representataion learning: 无论何种类型的type of node,最后都是一个同样长度的node embedding. $\mathbf{X} \in \mathbb{R}^{|V| \times d}$.
之前的DeepWalk提出了如何在homogeneous图上进行node embedding学习,是进行random walk,然后将生成的walks送到skip-gram model里面学习.
metapath2vec也是一样的,只不过是必须沿着metapath去走. 由于很多metapath首尾的node type是一样的,所以其实是可以走很远的.
metapath2vec++是仿照之前word2vec提出了negative sampling,
实验给的这个node embedding visualization挺impressive.
metapath2vec的sampling方法就是random walk,对于一个node v,从v出发,只能沿着metapath走, 每次选择的邻居的概率都相等,走给定步数,得到一个walk序列.
DGL的metapath2vec实现: 使用dgl.sampling.random_walk,可以指定metapath. 每个node有num_walks_per_node个walks,然后每个walk有walk_length这么长. https://github.com/dmlc/dgl/blob/master/examples/pytorch/metapath2vec/sampler.py#L98
metapath是一个list of str, str是异构图的link名字.
hg = dgl.heterograph(
{
("paper", "pa", "author"): (paper_author_src, paper_author_dst),
("author", "ap", "paper"): (paper_author_dst, paper_author_src),
("paper", "pc", "conf"): (paper_conf_src, paper_conf_dst),
("conf", "cp", "paper"): (paper_conf_dst, paper_conf_src),
}
)
for conf_idx in tqdm.trange(hg.num_nodes("conf")):
traces, _ = dgl.sampling.random_walk(
hg,
[conf_idx] * num_walks_per_node,
metapath=["cp", "pa", "ap", "pc"] * walk_length,
)
https://dl.acm.org/doi/pdf/10.1145/3097983.3098036