Closed HekpoMaH closed 3 weeks ago
The value net usage during search is here, https://github.com/revalo/tree-diffusion/blob/main/scripts%2Feval_td_search.py#L339. But it is currently commented because we were experimenting with using the policy log probs as value directly.
We don't have systematic empirical evidence yet, but anecdotally so far, we noticed that using the model log probs as value seemed to also perform as well as the value. Using the heuristic value, however, in our experiments doesn't work as well.
Hi again,
I'm looking at the eval scripts and it seems that all the time we use the value "heuristic" rather then the value network. Is that indeed the case? Where can I find an eval script that uses a value network?