The released training code is right?

yohanshin / WHAM

MIT License

719 stars 78 forks source link

The released training code is right? #98

Open ktxu1224 opened 5 months ago

ktxu1224 commented 5 months ago

Hey, @yohanshin Thank you for releasing the training code, however I found some problems about the training code , and I can't get the right inference result using my own training model. The problem is as follow: 1) About the released training code, as follow, before the trajectory_refiner, whether need to update the feet_world field in output, since the root position has been changed in reset_root_velocity function, but the released code is not, right? wham-refiner

2) I tried to retrain the model using the code in main branch, with the same config and amass.pth in stage1, but I can get the wrong result, the result is as follow, so I wonder whether there some bug in training code , or whether i should using the code in train branch to retrain model.

Looking forwards to your reply, thx a lot. @yohanshin

ktxu1224 commented 5 months ago

@dalgu90 @Arthur151 @yohanshin @RohaanA

yohanshin commented 5 months ago

Hi @ktxu1224 ,

Thank you for pointing this out. I think your point is also valid, but I intentionally scripted this refinement process as it is. The refinement process first does "coarse" refinement (reset_root_velocity) and then finely updates through the learning mechanism. The network's objective is to update coarsely refined trajectory (which only minimizes the foot sliding) and return feasible and smooth human trajectory. To this end, I think the network needs the information on the initial trajectory (initial feet). However, I am confident that your method will also achieve comparable performance.
Did you observe if your training loss / validation scores are going down? Your results seem to be incredibly bad, and I suspect that 1) your training code bursts somewhere, or 2) you did not load your checkpoint for the evaluation.

ktxu1224 commented 5 months ago

hi, @yohanshin

About the training result, I want to mention that, 1) I used the data (amass.pth) and loss weight you provided without any modify, besides (1) using the torch.nn.DataParallel function to train model parallerly , (2) process data without camera motion . And also I evaluate on the 3dpw dataset, the loss and evaulation error is as follow for you reference. loss visualization results:

I think maybe there are some bug in my code ,since validation score is going up, but the loss is seems to right, Can you give me some advice, thanks a lot.

ktxu1224 commented 5 months ago

@yohanshin @dalgu90 @Arthur151 @RohaanA