[AR-Diffusion] predict_xstart vs predict_x_start

raymond-yuan commented 10 months ago

Are these supposed to be the same variable? They control different things in the code. Just wanted to clarify behavior here.

In the default configs, one is set to True, and the other is set to False which leads to very confusing behavior.

wutong4012 commented 9 months ago

Please give the specific code location

k-chr commented 8 months ago

Please give the specific code location

In the GaussianDiffusion's method "training_losses_s2s" one can notice that in the 1622th LoC (gaussian_diffusion.py) predict_x_start is introduced (and this is the only one occurrence of that hyperparameter in the whole codebase of the AR-Diffusion):

if self.config.predict_x_start:
     x_output = model_out_x_start
else:
     x_output = x_start

The predict_xstart is referenced plenty of times in the codebase and its behaviour is, in fact, the same as setting the model_mean_type to ModelMeanType.START_X - that is, AR-Diffusion outputs $x_0$ instead of noise. However, what is more confusing, I found the comment that suggested the AR-Diffusion usually models a noise added in the forward diffusion process instead (in the DDIM sampling-related code).

        # Usually our model outputs epsilon, but we re-derive it
        # in case we used x_start or x_prev prediction.

wutong4012 commented 8 months ago

This is just an attempt we made. Here we still follow the Diffusion-LM approach, that is, x_output = x_start. That is, the corresponding predict_x_start=False in config.yaml. There is indeed some confusion here with predict_xstart, but you don’t need to worry about predict_x_start. This is code that is convenient for debugging.

microsoft / ProphetNet

[AR-Diffusion] predict_xstart vs predict_x_start #74