muhanzhang / D-VAE

D-VAE: A Variational Autoencoder for Directed Acyclic Graphs, NeurIPS 2019
MIT License
123 stars 28 forks source link

Batch Sizes #3

Closed vthost closed 3 years ago

vthost commented 3 years ago

Since the graphs in the two datasets have very similar sizes, I wonder why you use these very different batch sizes for the two tasks?

muhanzhang commented 3 years ago

Although they have similar number of nodes, the graphs in two datasets have different number of node types, different edge density, and even slightly different encoders (DVAE_BN uses gated summed X instead of H as message from predecessors, considering d-separation of Bayesian nets). This makes their training losses very different. Thus, the optimal batch sizes are also different.