Looking for where value network is used

revalo / tree-diffusion

Diffusion on syntax trees for program synthesis

https://tree-diffusion.github.io

MIT License

386 stars 19 forks source link

Looking for where value network is used #8

Closed HekpoMaH closed 3 weeks ago

HekpoMaH commented 4 weeks ago

Hi again,

I'm looking at the eval scripts and it seems that all the time we use the value "heuristic" rather then the value network. Is that indeed the case? Where can I find an eval script that uses a value network?

revalo commented 4 weeks ago

The value net usage during search is here, https://github.com/revalo/tree-diffusion/blob/main/scripts%2Feval_td_search.py#L339. But it is currently commented because we were experimenting with using the policy log probs as value directly.

We don't have systematic empirical evidence yet, but anecdotally so far, we noticed that using the model log probs as value seemed to also perform as well as the value. Using the heuristic value, however, in our experiments doesn't work as well.