Closed kzk2000 closed 2 years ago
I think I found my own answer - during training, there's no "actual" sampling as you simply use forward/backward pass on the "sampled" trajectories. so 'back_sample_trajectory' is really just 'backward_pass_trajectory'
Yeah, and when I chose the word "sample" it's in the context of a backward policy that is being actively learned and may change on the next training batch. So, in some sense we're "sampling" the most likely trajectory from the current policy.
But you got the idea. Very close reading! Thanks for reaching out.
Though the function is called 'back_sample_trajectory', it doesn't actually sample but always greedily goes for the action with the highest probability, see
https://github.com/mbi2gs/gflownet_tf2/blob/main/gfn.py#L205
Really asking to understand Gflownet implementations better: Shouldn't there be an action sampler similar to the one in the forward pass here: https://github.com/mbi2gs/gflownet_tf2/blob/aece0a5463dc0df4d1773bebdb136efbb35fe317/gfn.py#L152 ?