求解代码里的些许问题

perveil commented 3 years ago

首先，非常感谢您share SGL的代码，今天在看您代码的过程中，发现自己对sub_mat这个dict，并不是很理解，请问可以帮忙注释一下吗？非常感谢，这对一个初入门的选手太重要了。

    with tf.name_scope("input_data"):
            self.users = tf.placeholder(tf.int32, shape=(None,))
            self.pos_items = tf.placeholder(tf.int32, shape=(None,))
            self.neg_items = tf.placeholder(tf.int32, shape=(None,))

            self.sub_mat = {}
            if self.aug_type in [0, 1]:
                #0: Node Dropout; 1: Edge Dropout
                self.sub_mat['adj_values_sub1'] = tf.placeholder(tf.float32) 
                self.sub_mat['adj_indices_sub1'] = tf.placeholder(tf.int64)
                self.sub_mat['adj_shape_sub1'] = tf.placeholder(tf.int64)

                self.sub_mat['adj_values_sub2'] = tf.placeholder(tf.float32)
                self.sub_mat['adj_indices_sub2'] = tf.placeholder(tf.int64)
                self.sub_mat['adj_shape_sub2'] = tf.placeholder(tf.int64)
           else:
                #2: Random Walk
                for k in range(1, self.n_layers + 1):
                    self.sub_mat['adj_values_sub1%d' % k] = tf.placeholder(tf.float32, name='adj_values_sub1%d' % k)
                    self.sub_mat['adj_indices_sub1%d' % k] = tf.placeholder(tf.int64, name='adj_indices_sub1%d' % k)
                    self.sub_mat['adj_shape_sub1%d' % k] = tf.placeholder(tf.int64, name='adj_shape_sub1%d' % k)

                    self.sub_mat['adj_values_sub2%d' % k] = tf.placeholder(tf.float32, name='adj_values_sub2%d' % k)
                    self.sub_mat['adj_indices_sub2%d' % k] = tf.placeholder(tf.int64, name='adj_indices_sub2%d' % k)
                    self.sub_mat['adj_shape_sub2%d' % k] = tf.placeholder(tf.int64, name='adj_shape_sub2%d' % k)

wujcan commented 3 years ago

首先，非常感谢您share SGL的代码，今天在看您代码的过程中，发现自己对sub_mat这个dict，并不是很理解，请问可以帮忙注释一下吗？非常感谢，这对一个初入门的选手太重要了。

    with tf.name_scope("input_data"):
            self.users = tf.placeholder(tf.int32, shape=(None,))
            self.pos_items = tf.placeholder(tf.int32, shape=(None,))
            self.neg_items = tf.placeholder(tf.int32, shape=(None,))

            self.sub_mat = {}
            if self.aug_type in [0, 1]:
                #0: Node Dropout; 1: Edge Dropout
                self.sub_mat['adj_values_sub1'] = tf.placeholder(tf.float32) 
                self.sub_mat['adj_indices_sub1'] = tf.placeholder(tf.int64)
                self.sub_mat['adj_shape_sub1'] = tf.placeholder(tf.int64)

                self.sub_mat['adj_values_sub2'] = tf.placeholder(tf.float32)
                self.sub_mat['adj_indices_sub2'] = tf.placeholder(tf.int64)
                self.sub_mat['adj_shape_sub2'] = tf.placeholder(tf.int64)
           else:
                #2: Random Walk
                for k in range(1, self.n_layers + 1):
                    self.sub_mat['adj_values_sub1%d' % k] = tf.placeholder(tf.float32, name='adj_values_sub1%d' % k)
                    self.sub_mat['adj_indices_sub1%d' % k] = tf.placeholder(tf.int64, name='adj_indices_sub1%d' % k)
                    self.sub_mat['adj_shape_sub1%d' % k] = tf.placeholder(tf.int64, name='adj_shape_sub1%d' % k)

                    self.sub_mat['adj_values_sub2%d' % k] = tf.placeholder(tf.float32, name='adj_values_sub2%d' % k)
                    self.sub_mat['adj_indices_sub2%d' % k] = tf.placeholder(tf.int64, name='adj_indices_sub2%d' % k)
                    self.sub_mat['adj_shape_sub2%d' % k] = tf.placeholder(tf.int64, name='adj_shape_sub2%d' % k)

这里是因为我在每个epoch都要新生成新的subgraph，而我选择在CPU中先计算好subgraph的邻接矩阵然后feed到训练图中的方式，这里的sub_mat就是构建subgraph的邻接矩阵所需的参数。需要说明的一点是，邻接矩阵是sparse tensor，建议你去看下TensorFlow里对tf.sparse.SparseTensor的介绍。

perveil commented 3 years ago

你好，subgraph 就是指在data argument（0，1，2）之后的图吗？

wujcan commented 3 years ago

你好，subgraph 就是指在data argument（0，1，2）之后的图吗？

是的

perveil commented 3 years ago

非常感谢您的回答：我的其他问题如下： splitter=given（ratio、loo）具体指的是什么？数据看到使用了md5编码，这么做是为了啥呀？非常感谢

wujcan commented 3 years ago

非常感谢您的回答：我的其他问题如下： splitter=given（ratio、loo）具体指的是什么？数据看到使用了md5编码，这么做是为了啥呀？非常感谢

SGL的代码是在NeuRec的基础上修改的，它里面集成了多个推荐模型和数据集。splitter这个参数是表示数据生成的方式，ratio是指按一定比例随机生成训练数据，比如8：2；loo指leave-one-out；given是读取已经划分好的数据。SGL里是提前划分好数据，从磁盘读入，不用在线划分，所以splitter=given。

在第一遍运行新数据集时，代码里根据参数重新对id进行映射，并把映射后的数据保存下来，md5是为了验证文件是否一致或损坏。

perveil commented 3 years ago

非常感谢！祝工作顺利。

wujcan / SGL-TensorFlow

求解代码里的些许问题 #8