I quickly tested the code and have a couple of questions:
[Discrete data]: Could you please clarify the definition of energy in this context? I understand it is sum of squared differences (some estimate of true dist.) between predicted and target values. However, is it possible to use something like negative log-likelihood (NLL) for energy instead and then work with an objectivepred_x0? Since it's hard to estimate the noise in discrete cases?
[Sampling - inner optimization loop]: The energy values seem to be discarded after a single comparison step. Is this intentional? Am I missing something, or is the energy used solely for determining the direction?
In the setting of discrete data, we currently just treat it like continuous data, and add a continuous amount of noise to it. At higher noise levels, even with discrete data, the energy landscape would be smooth. Yes -- you could definitely train the energy function using NLL (potentially using the discrete diffusion analog of "discrete gradients" as the NLL objective to train the energy function)
The energy function is also used to compute the gradient (inside the network forward function). The energy function here in sampling is used to determine if the gradient step size is too large (and to reject the sample if it raises the energy). Adding additional uses of the energy -- for instance to adaptively find the correct step size would be interesting.
Hi @yilundu, @vacancy,
Thank you very much for releasing the code!
I quickly tested the code and have a couple of questions:
[Discrete data]
: Could you please clarify the definition of energy in this context? I understand it is sum of squared differences (some estimate of true dist.) between predicted and target values. However, is it possible to use something like negative log-likelihood (NLL) for energy instead and then work with an objectivepred_x0
? Since it's hard to estimate the noise in discrete cases?[Sampling - inner optimization loop]
: The energy values seem to be discarded after a single comparison step. Is this intentional? Am I missing something, or is the energy used solely for determining the direction?Thanks!