Motion augmentation explanation

yigals commented 1 year ago

Hi, I'm reading your great paper. Thanks for the explanations and insights, they are very helpful for understanding the design decisions! I was wondering if you could elaborate a bit further on the motion augmentation technique - IIUC, for 20% of samples from each batch, you shuffle up dynamic encodings, i.e., you give the decoder encodings of some dynamics from a sequence, while the static encodings are taken from a different sequence. I see from the ablation study that this augmentation helps greatly with collisions. I would love to understand your reasoning about this - I would expect the decoder to perform worse as it would be confused seeing a static encoding which has nothing to do with the given dynamics. (I understand that the encoder isn't updated for the shuffled encodings, using stop gradient.) Thanks!

hbertiche commented 1 year ago

The idea of this augmentation is to make the decoder more robust, we define the decoder input 'z' as the addition of the static code 'z_s' and the dynamic code 'z_d', that is, 'z = z_s + z_d', because of this, the subspace of all valid encoded garments 'Z = {z}' is "around" the subspace of all valid static encoded garments 'Z_s = {z_s}', then, this augmentation is giving the decoder more samples "around" the static subspace 'Z_s' by creating new "fake" encodings 'z', as explained in the paper, for these samples, we can only apply static losses, since the dynamics are most likely inconsistent, this will increase the robustness of the decoder because it will have seen more samples around the static subspace Z_s during training

yigals commented 1 year ago

Oh I see, thanks for the explanation! So this is not unlike Self-Supervised Collision Handling, right? Sample around from the distribution of encodings to increase decoder performance.

Out of curiosity,

Did you take a look at decoded garments after this augmentation?
Did you try other strategies for augmenting around z_s, like sampling randomly from some distribution, or perhaps did you have any insights about the distribution of z_d?
Unrelated, did you try to train with various body shapes, as opposed to only poses?

Thanks a lot!

hbertiche commented 1 year ago

Let me answer: 1) Yes, and as the metrics suggests, collisions are significantly reduced 2) By shuffling dynamic codes I am randomly sampling from a distribution (training distribution). Another strategy is to make the dynamic subspace 'Z_d ={z_d}' ressemble a Gaussian (with KL-divergence loss) and sample randomly from a Gaussian for motion augmentation. Since shuffling already works well, I did not explore other strategies. 3) No, I have not explored body shape generalization. It should be possible to do so. It should even be possible to consider shape variations for dynamics (although this would significantly increase the required training time).

I hope these answers are useful, feel free to ask further questions!

yigals commented 1 year ago

Thanks a lot! :D

hbertiche / NeuralClothSim

Motion augmentation explanation #1