muhanzhang / D-VAE

D-VAE: A Variational Autoencoder for Directed Acyclic Graphs, NeurIPS 2019
MIT License
123 stars 28 forks source link

Introduce the iterative scheme into the proposed architecture #7

Closed lee-man closed 2 years ago

lee-man commented 2 years ago

Hi,

Thanks for your great work! When I read your paper and code, I am curious about the (bidirectional) encoding part. In your paper, the aggregation for the source nodes is described as:

If an empty set is input to A (corresponding to the case for the starting node without any predecessors), we let A output an all-zero vector.

This case applied to both the forward layer and the backward layer. My point is that after the forwarding, the hidden states of the last nodes are obtained, can it be the initial values of A for the backward layer?

I know this way is not consistent with the formulation Eq. 3, wherein the incoming information is defined as aggregated information from the predecessors. But if the computation is done in this way, actually we can introduce the iterative scheme into the architecture:

0s for source node -> (forward layer) -> Some vectors for sink node -> (backward layer) -> Some vectors for source node -> (forward layer) -> ......

The benefit could be that after a few iterations, the hidden state of source nodes and sink nodes become converged. And it may get a better encoding space for the later decoding. I think such an iterative scheme is used in the following two papers (all related to Satisfiability problem):

  1. Amizadeh, Saeed, Sergiy Matusevych, and Markus Weimer. "Learning to solve circuit-SAT: An unsupervised differentiable approach." International Conference on Learning Representations. 2018.
  2. Learning a SAT Solver from Single-Bit Supervision.

Looking forward to your feedback. Thanks!

muhanzhang commented 2 years ago

Hi @lee-man ! Thanks for the question. The reason why I didn't use the end node's state to initialize the backward propagation is to avoid the too long message passing path. Deep learning and backpropagation suffer from vanishing gradient issues if the message passing path is too long (unless tricks like skip connections are applied). Also, the long paths make the training harder. Nevertheless, you may have a try of the idea and compare the performance.

lee-man commented 2 years ago

@muhanzhang Got it. Thanks for your reply. I will look into it.