korrawe / Diffusion-Noise-Optimization

DNO: Optimizing Diffusion Noise Can Serve As Universal Motion Priors
Other
87 stars 2 forks source link

Code and Steps for Training base GMD model #7

Open MKowal2 opened 5 days ago

MKowal2 commented 5 days ago

Hi there, fantastic work! The idea and the results are very impressive :)

I was wondering if you could provide more details on how you trained the baseline GMD model. I see there are some files in the 'train' directory but running the main script runs into import errors. It would be great if you could provide the steps to train our own GMD base-model and also if you have any comments on how important the base model is to the final results (e.g., why did you train your own instead of use the original).

Thank you!

korrawe commented 1 day ago

Hi, Thank you for your kind words!

For your question, we generally test our methods without retraining the models. We only retrain MDM with Exponential Model Averaging because it makes the training more stable and gives better results. For GMD, MLD, and HuMoR, we use the original as-is.

For DNO-MLD and DNO-GMD, we need to modify the original code a bit to enable gradient propagation through the denoising chain, which is usually done by removing the torch.no_grad() decorator. We didn't provide the code for this part as it is difficult to integrate it cleanly into the original repo. This can be done by running the dno.py on top of the denoising code in those models (which you will need to modify).

There are some GMD training-related files left in the repo because we built DNO based on GMD code but I didn't have time to completely remove them.

For the second question about the importance of the base model, my intuitions are

  1. The performance of the based model matters a lot both in terms of performance upper bound and the inference speed. We want a model with fast inference but also with a good motion space - motions that look similar should be near each other in this space so that it can be easily optimized.

  2. The optimization is generally more difficult with the latent diffusion model because it adds another layer of space transformation to the pipeline. The key problem is now it is more likely the output that we want will not be in this latent space at all because there is no such motion when we train the latent encoder and decoder. How to train the encoder-decoder specifically for easier optimization will need to be investigated.

  3. We didn't test the optimization on the VQ-based latent code but we believe the optimization will be more difficult because it will be a discrete optimization and there might be a problem with token flips. But I think it is interesting to investigate.

Choosing the base model is then a question of trade-off between these factors.

I hope this is useful for you. Cheers!