Closed lyrgwlr closed 4 years ago
What's the 'bbox' column exactly means in the dataframe you provided. In the paper, it's described as: "The bounding boxes were drawn to include the clubhead and golf ball through the full duration of the swing." But it seems only 1x4 array is provided.
There is only one bounding box for each video. The bounding box was drawn such that the clubhead and golf ball were visible in all frames. I will be uploading some code soon. I am working on re-implementing the model in TF.
@wmcnally Hi, I have a little confuse about your paper. In 6.3 Baseline SwingNet, you said the batch size is 24, and the sequence length is 64. So the input's shape is (24*64, 3, 160, 160) , is that right? I have tried load the tensor into my NVIDIA 1080Ti which has 11G memory, but out of memory happended. Would you help me? Thanks a lot.
That’s the same input shape I used. Recall in the paper I mentioned that I froze the first 10 layers of MobileNetV2. I was also using a Titan X (12GB memory). So you will need to lower your batch size. I will upload some PyTorch code this week.
@wmcnally , Probably something goes wrong. Because I have tried frozening the first 10 layers of MobileNetV2, and I set my batch size to even 10, the out of memory errors still came... So I'm looking forward for your code. Thanks~
@lyrgwlr I uploaded some starter PyTorch code and a link to preprocessed videos. Please let me know if there are any issues.
@wmcnally , Thanks for your code, and the training can run well. I add some codes:
And my batch size is 22, sequence length is 64. I trained 4 splits and each for 7k iterations. The model is loaded once before the training and the learning rate I set is like this. Because the paper said "the learning rate was reduced by an order of magnitude after 5k iterations":
(I also tried setting the learning rate like this: The result is wrose about 5% PCE. )
Then the PCE of 4 splits is like this:
Obviously it's far from the paper. So are there something I missed?
@lyrgwlr It looks like you trained a single model for 28k iterations. This is incorrect. Are you familiar with k-fold cross-validation? You should do a 4-fold cross validation, where you train 4 separate models (1 for each split). In the paper I used a learning rate of 0.001 for the first 5k iterations and 0.0001 for the remaining 2k iterations. Save checkpoints frequently as overfitting may be a reason for low PCE. Depending on your implementation of random affine transformations you may get different results than in the paper. The paper also used a batch size of 24.
@wmcnally , sorry for bother you. :( Is the 4-fold cross validation do it like that:
I trained like this today but still get the bad results (about 42% PCE).
@lyrgwlr There could be something wrong with your data augmentation. My guess is you are overfitting. What is the PCE for the same model after 2k iterations?
@wmcnally The PCE for the same model after 2k iterations is little lower than 7k iterations so I think it's no the overfitting issue. Besides, here are some images after my data augmentation. I did 50% percent horizontal flip and random affine with +-5° rotation and +-5° shear.
Please let me know if something wrong with my data augmentation. Today I will train without horizontal flip and random affine and report the result to you. Thanks~
@lyrgwlr The training code I provided gives a PCE of 71.5 after ~1.8k iterations so there could be something wrong with your code. If you implement data augmentation correctly you should be able to improve the PCE. I'm closing this issue as there is nothing wrong with the code I have provided. Good luck!
@wmcnally , I figured it out. It's the FPS problem. The videos I downloaded from Youtube are not all the 30 FPS but some videos are 60FPS. So I made some changes in "preprocess_videos.py". But I forgot to change the codes in "dataloader.py". events *= round(fps/30) is needed if the videos are not the 30FPS otherwise the labels can't match the real video files. I got the normal PCE result now. I hope this could help those who have the similar problem like I met.
@lyrgwlr I didn’t realize you downloaded and preprocessed your own videos. Nicely done!
@wmcnally The weights file you provided is trained by 22 batch size or 24 batch size. I trained a model with your original training setting but I got only 68.5% PCE on split 1.
I'm interesting on your paper:GolfDB:AVideoDatabaseforGolfSwingSequencing. So I want to request your SwingNet code for research purpose. Thanks a lot.