Closed vthost closed 3 years ago
Although they have similar number of nodes, the graphs in two datasets have different number of node types, different edge density, and even slightly different encoders (DVAE_BN uses gated summed X instead of H as message from predecessors, considering d-separation of Bayesian nets). This makes their training losses very different. Thus, the optimal batch sizes are also different.
Since the graphs in the two datasets have very similar sizes, I wonder why you use these very different batch sizes for the two tasks?