microsoft / Graphormer

Graphormer is a general-purpose deep learning backbone for molecular modeling.
MIT License
2.08k stars 334 forks source link

Training Graphormer-Small on dataset PCQM4M and Understanding of Edge Encoder #32

Closed meloncolie closed 2 years ago

meloncolie commented 2 years ago

Hello, I have read your code and have some question.

  1. How to train Graphormer-Small on dataset PCQM4M, could you provide a sample? e.g. shell code
  2. How to understand the dimension nn.Embedding(512 * 3 + 1, num_heads, padding_idx=0) of edge encoder? The shape of edge_input at the end of the function collator in colloator.py (before making a Batch format) is [n_graph, n_node, n_node, multi_hop_max_dist, n_edge_features], where n_edge_features=3. So how do the multiplication with 512 and the addition with 1 come? Thanks a lot!
zhengsx commented 2 years ago
  1. the shell code could be refer to this, by replacing the --n_layers=12 to 6, and other hyper-parameters described in Table 7 in the paper.
  2. the first dimension of nn.Embedding is the vocabulary size, and edge feature is the category feature, which means the vocabulary size should be greater than the number of the class of all categories (all 3 edge feature dimensions). Thus we use 512 to represent the max_num_classes for each feature.
zhengsx commented 2 years ago

Close this issue due to inactive. Feel free to raise a new one or reopen this one for any further question.