yule-BUAA / DyGLib_TGB

An Empirical Evaluation of Temporal Graph Benchmark
MIT License
27 stars 8 forks source link

node_raw_features padded twice? #1

Open hsinghuan opened 3 months ago

hsinghuan commented 3 months ago

Thank you for the effort in creating DyGLib and adapting it to TGB! I have a question regarding line 151 to line 161 in utils/DataLoader.py

    if 'node_feat' not in data.keys():
        node_raw_features = np.zeros((num_nodes + 1, 1))
    else:
        node_raw_features = data['node_feat'].astype(np.float64)
        # deal with node features whose shape has only one dimension
        if len(node_raw_features.shape) == 1:
            node_raw_features = node_raw_features[:, np.newaxis]

    # add feature of padded node and padded edge
    node_raw_features = np.vstack([np.zeros(node_raw_features.shape[1])[np.newaxis, :], node_raw_features])
    edge_raw_features = np.vstack([np.zeros(edge_raw_features.shape[1])[np.newaxis, :], edge_raw_features])

It seems like node_raw_features would be padded twice if 'node_feat' is not in data.keys(). The first time is in np.zeros((num_nodes + 1, 1)) while the second time is in np.vstack([np.zeros(node_raw_features.shape[1])[np.newaxis, :], node_raw_features]). This makes the length of the 0-th dimension of node_raw_features greater than the number of unique nodes by 2. I am not sure if this is intended or not. Please let me know if I missed something. Thanks!

yule-BUAA commented 1 week ago

Thank you for your patience. Sorry for the late response.

Your understanding is correct. For the case where node_feat is not present in data.keys(), our code indeed pads node_raw_features twice. Normally, we only need to pad an additional node (for sequence or neighbor completion). However, this redundant padding does not affect the model performance because the padded features have no actual meaning and are all zeros. Additionally, the node indices used to index node_raw_features will not cause an out-of-bounds issue.

If you still wish to address this issue, you can modify the line node_raw_features = np.zeros((num_nodes + 1, 1)) to node_raw_features = np.zeros((num_nodes, 1)).

Thank you for pointing this out!