shenweichen / GraphEmbedding

Implementation and experiments of graph embedding algorithms.
MIT License
3.63k stars 990 forks source link

LINE采样问题 #51

Open yycg opened 3 years ago

yycg commented 3 years ago

line.py中111行到137行,建立结点同名表的时候,norm_prob的总和是1,在create_alias_table函数里将norm_prob转换为均值为1。请问为什么在创建边同名表的时候,norm_prob的均值是1?

    def _gen_sampling_table(self):

        # create sampling table for vertex
        power = 0.75
        numNodes = self.node_size
        node_degree = np.zeros(numNodes)  # out degree
        node2idx = self.node2idx

        for edge in self.graph.edges():
            node_degree[node2idx[edge[0]]
                        ] += self.graph[edge[0]][edge[1]].get('weight', 1.0)

        total_sum = sum([math.pow(node_degree[i], power)
                         for i in range(numNodes)])
        norm_prob = [float(math.pow(node_degree[j], power)) /
                     total_sum for j in range(numNodes)]

        self.node_accept, self.node_alias = create_alias_table(norm_prob)

        # create sampling table for edge
        numEdges = self.graph.number_of_edges()
        total_sum = sum([self.graph[edge[0]][edge[1]].get('weight', 1.0)
                         for edge in self.graph.edges()])
        norm_prob = [self.graph[edge[0]][edge[1]].get('weight', 1.0) *
                     numEdges / total_sum for edge in self.graph.edges()]

        self.edge_accept, self.edge_alias = create_alias_table(norm_prob)
andrew-zzz commented 2 years ago

bug边的权重计算多乘了个numEdges