wmcnally / golfdb

GolfDB is a video database for Golf Swing Sequencing, which involves detecting 8 golf swing events in trimmed golf swing videos. This repo demos the baseline model, SwingNet.
198 stars 60 forks source link

Do you intend to open your code? #2

Closed lyrgwlr closed 4 years ago

lyrgwlr commented 4 years ago

I'm interesting on your paper:GolfDB:AVideoDatabaseforGolfSwingSequencing. So I want to request your SwingNet code for research purpose. Thanks a lot.

lyrgwlr commented 4 years ago

What's the 'bbox' column exactly means in the dataframe you provided. In the paper, it's described as: "The bounding boxes were drawn to include the clubhead and golf ball through the full duration of the swing." But it seems only 1x4 array is provided.

wmcnally commented 4 years ago

There is only one bounding box for each video. The bounding box was drawn such that the clubhead and golf ball were visible in all frames. I will be uploading some code soon. I am working on re-implementing the model in TF.

lyrgwlr commented 4 years ago

@wmcnally Hi, I have a little confuse about your paper. In 6.3 Baseline SwingNet, you said the batch size is 24, and the sequence length is 64. So the input's shape is (24*64, 3, 160, 160) , is that right? I have tried load the tensor into my NVIDIA 1080Ti which has 11G memory, but out of memory happended. Would you help me? Thanks a lot.

wmcnally commented 4 years ago

That’s the same input shape I used. Recall in the paper I mentioned that I froze the first 10 layers of MobileNetV2. I was also using a Titan X (12GB memory). So you will need to lower your batch size. I will upload some PyTorch code this week.

lyrgwlr commented 4 years ago

@wmcnally , Probably something goes wrong. Because I have tried frozening the first 10 layers of MobileNetV2, and I set my batch size to even 10, the out of memory errors still came... So I'm looking forward for your code. Thanks~

wmcnally commented 4 years ago

@lyrgwlr I uploaded some starter PyTorch code and a link to preprocessed videos. Please let me know if there are any issues.

lyrgwlr commented 4 years ago

@wmcnally , Thanks for your code, and the training can run well. I add some codes:

  1. random affine and horizontal flip as the paper said.
  2. xaiver normal/uniform initialization for the fc layer.

And my batch size is 22, sequence length is 64. I trained 4 splits and each for 7k iterations. The model is loaded once before the training and the learning rate I set is like this. Because the paper said "the learning rate was reduced by an order of magnitude after 5k iterations": image

(I also tried setting the learning rate like this: image The result is wrose about 5% PCE. )

Then the PCE of 4 splits is like this: image

Obviously it's far from the paper. So are there something I missed?

wmcnally commented 4 years ago

@lyrgwlr It looks like you trained a single model for 28k iterations. This is incorrect. Are you familiar with k-fold cross-validation? You should do a 4-fold cross validation, where you train 4 separate models (1 for each split). In the paper I used a learning rate of 0.001 for the first 5k iterations and 0.0001 for the remaining 2k iterations. Save checkpoints frequently as overfitting may be a reason for low PCE. Depending on your implementation of random affine transformations you may get different results than in the paper. The paper also used a batch size of 24.

lyrgwlr commented 4 years ago

@wmcnally , sorry for bother you. :( Is the 4-fold cross validation do it like that:

  1. For each split, create a new model and train 7k iterations.
  2. Then run eval.py for each split using corresponding checkpoints of each model.

I trained like this today but still get the bad results (about 42% PCE).

wmcnally commented 4 years ago

@lyrgwlr There could be something wrong with your data augmentation. My guess is you are overfitting. What is the PCE for the same model after 2k iterations?

lyrgwlr commented 4 years ago

@wmcnally The PCE for the same model after 2k iterations is little lower than 7k iterations so I think it's no the overfitting issue. Besides, here are some images after my data augmentation. I did 50% percent horizontal flip and random affine with +-5° rotation and +-5° shear. 413 571 629 0

Please let me know if something wrong with my data augmentation. Today I will train without horizontal flip and random affine and report the result to you. Thanks~

wmcnally commented 4 years ago

@lyrgwlr The training code I provided gives a PCE of 71.5 after ~1.8k iterations so there could be something wrong with your code. If you implement data augmentation correctly you should be able to improve the PCE. I'm closing this issue as there is nothing wrong with the code I have provided. Good luck!

lyrgwlr commented 4 years ago

@wmcnally , I figured it out. It's the FPS problem. The videos I downloaded from Youtube are not all the 30 FPS but some videos are 60FPS. So I made some changes in "preprocess_videos.py". But I forgot to change the codes in "dataloader.py". events *= round(fps/30) is needed if the videos are not the 30FPS otherwise the labels can't match the real video files. I got the normal PCE result now. I hope this could help those who have the similar problem like I met.

wmcnally commented 4 years ago

@lyrgwlr I didn’t realize you downloaded and preprocessed your own videos. Nicely done!

lyrgwlr commented 4 years ago

@wmcnally The weights file you provided is trained by 22 batch size or 24 batch size. I trained a model with your original training setting but I got only 68.5% PCE on split 1.