Reproduce test results on ho3d

zehongs commented 2 years ago

Hi, thanks for the great work! I'm reproducing the numbers of HO-3D in Table 1. And I have several questions.

For each of the ten objects, how many samples should I use during evaluation? 100000?
Am I correct to re-initialize the cVAE decoder weights at the begining of each sequence(object) and keep it being optimized over the whole sequence?
Will the batch-size, the order of samples and the augmentation of samples affect the evaluation results?

Thanks!

hwjiang1510 commented 2 years ago

Hi Zehong,

Thanks for your attention to our work.

We evaluate all the samples in the dataset for each object. We follow the experiment setting in Grasping Field (Karunratanakul et al.), and it is included in Appendix C in their paper.
Yes, this is the setting for online-TTA. You are correct.
I believe they will influence the results.
- First, if the batch size is too small, it may not provide enough information to update the networks.
- Second, the order of sample is also important, if the frames in a video are shuffled, the update of online-TTA will not be continuous anymore (because two consecutive inputs to the network will not be continuous). Actually, TTA can somehow make the network overfit the current input. So for online-TTA, we have to make sure the inputs are consecutive frames to adapt the network smoothly.
- Third, the augmentation will heavily influence the results. If the augmentation is too small, it will not provide enough information (almost equal to learning on a single input sample). If the augmentation is too strong, the augmented ones will be very different from the current sample we want to adapt the network to, and they will have huge diversity. These strong augmentations can even make the model collapse. I fine-tuned the augmentations one by one, and it is the tricky part in TTA.

zehongs commented 2 years ago

Hi, thanks for your reply!

I also want to confirm that, during testing, I should use one frame from FPA-C/HO3D to construct a batch of 32. And the random translations in [-5, 5] cm are used as the augmentation to construct the batch. Then, I will run online-TTA for 10 iterations, and use the predicted MANO from the instance that's not augmented as output?

hwjiang1510 commented 2 years ago

Hi, thanks for your reply!

I also want to confirm that, during testing, I should use one frame from FPA-C/HO3D to construct a batch of 32. And the random translations in [-5, 5] cm are used as the augmentation to construct the batch. Then, I will run online-TTA for 10 iterations, and use the predicted MANO from the instance that's not augmented as output?

Yes, exactly. The loss functions are provided, so you can simply optimize the network parameters using them

hwjiang1510 / GraspTTA

Reproduce test results on ho3d #1