mbi2gs / gflownet_tf2

Generative Flow Network demo in Tensorflow2
MIT License
7 stars 4 forks source link

Reproduce the Hyper-grid results in GFlowNet paper #3

Closed cancan233 closed 2 years ago

cancan233 commented 2 years ago

Hi,

Thanks for sharing such a beautiful code! It is really helpful for me to learn how to implement GFlowNet.

Here I have some questions.

  1. In your notebook, the learned policy can reproduce the reward grid quite well with l2 as low as 0.2. However, I simply clone the repository and run the cells, the result is not so close with l2 being 0.7 or so. Below is the result I have. I wonder if there is anything I have missed.
image
  1. I go through the gfnAgent class. In line 330, the probabilities of forward and backward should take the log before being summed with z0. However, it was added directly. Can you help me understand that part?

https://github.com/mbi2gs/gflownet_tf2/blob/5d786d29788c15bcdbb7c3e4fb4912ca88051df7/gfn.py#L331

Thanks again for such great code!

Best, Cancan

mbi2gs commented 2 years ago

Hi Cancan, thanks for you questions.

  1. The model training is stochastic, so each run of the model will be slightly different. Starting with different initial weights, or training data in a different order---all these things can change the ultimate result. So, I would recommend running the code several times to get a feel for the range of possibilities.
  2. The model outputs probabilities in log space directly ("logits") so there is no need to log transform them afterwards.

Hope that's helpful!