jeffreyyihuang / two-stream-action-recognition

Using two stream architecture to implement a classic action recognition method on UCF101 dataset
MIT License
857 stars 249 forks source link

Pre-trained model #4

Closed cwzat closed 6 years ago

cwzat commented 6 years ago

Could you publish your pre-trained models? Thank you !

jeffreyyihuang commented 6 years ago

Sure, but currently I am busy on some projects. I will probabaly update this repo at about Dec 20.

Jeffrey

cwzat commented 6 years ago

Thank you very much! I am very glad to hear that!

(null) On 12/06/2017 20:32, Jeffrey wrote:

Sure, but currently I am busy on some projects. I will probabaly update this repo at about Dec 20.

Jeffrey

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

cwzat commented 6 years ago

I am reading your code, and i am not familiar with torch, could you tell me how you modify the weights of first conv layer pre-trained with ImageNet form (64 3 7 7)to(64 20 7 7)? Thank you!

(null) On 12/06/2017 20:35, cwzat wrote: Thank you very much! I am very glad to hear that!

(null) On 12/06/2017 20:32, Jeffrey wrote:

Sure, but currently I am busy on some projects. I will probabaly update this repo at about Dec 20.

Jeffrey

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

jeffreyyihuang commented 6 years ago

Hi, This method is called cross-modality pretraining, which is proposed in "Temporal Segment Networks: Towards Good Practices for Deep Action Recognition". The procedure is to average the weight value across the RGB channels and replicate this average by the channel number of motion stream input( which is 20 is this case).

Jeffrey

cwzat commented 6 years ago

Thank you, I got it. Now, I am trying this method with keras and I get some troubles, are you familiar with it?If so, I think I can get your help.

jeffreyyihuang commented 6 years ago

Hi, I have tried the two stream network on Keras before but not quite familiar. Could you post your issue? There might be something I can do for help.

cwzat commented 6 years ago

I use model.get_config() to complete cross-modality pretraining, and i use Inception-resnet-v2 model, optimizer is Adam(default parameters) / SGD(lr=1e-2, 0.9), optical frames are stacked with 10-x and 10-y, but the acc is very low(65%), I want to know more details about your model, or could you give me some advise!

cwzat commented 6 years ago

If you have keras pretrained optical flow, could you publish it? Thank you !

jeffreyyihuang commented 6 years ago

Sorry, I don't have keras pretrained optical flow model.

I think the reason caused low acc might be the sampling method in your training stage since I do have some related experiences on pytorch framework. Could you provide some details about how you sample your training data in each batch?

Jeffrey

roystonrodrigues commented 6 years ago

Can you please share your pretrained models. This would be helpful to run your code in the testing phase.

cwzat commented 6 years ago

@jeffreyhuang1 There are 8631 video-samples in train set. Each batch, I randomly choose 32 video-samples from it. And each video i random choose 10 x-frames and 10 y-frames. Then i stack it, the result is (32, 229 , 229, 20). On the third axis, the first ten numbers are 10 x-frames, the last ten numbers are 10 y-frames. All the frames is continuous.

jeffreyyihuang commented 6 years ago

@roystonrodrigues Hi, I just share my new version of pretrained model and code today. You can test it and feel free to correct my mistakes.

Jeffrey

jeffreyyihuang commented 6 years ago

@cwzat According to the two-stream paper, I remember that the input of motion stream is a stack of 10 consecutive optical flow. In my opinion, maybe your problem is in the sampling stage that you randomly choose 10 x-frames and 10 y-frames rather than choose the consecutive x,y optical flow.

Jeffrey

cwzat commented 6 years ago

@jeffreyhuang1 I already choose them consecutivly and the acc is low yet. Could you give me some another advices?

jeffreyyihuang commented 6 years ago

@cwzat

oops, sorry my bad. I lose some information in your message. I check the implementation method in the two-stream paper and find that

screenshot 98

Therefore, on your third axis, the order of your data should be [x0, y0, x1, y1, x2, y2, ...] Maybe be you can try this one!!

Sorry again for misread your message. Jeffrey

cwzat commented 6 years ago

@jeffreyhuang1 It is okay! Thank you very for your help! I am very glad to solve the problem through your help! I try it right now!

jeffreyyihuang commented 6 years ago

@cwzat, I look forward to hearing your good news soon XD

Jeffrey

cwzat commented 6 years ago

@jeffreyhuang1 I have another quention, how do you choose the optimizer and the parameters?

jeffreyyihuang commented 6 years ago

@cwzat, basically, I just follow the setting in the paper, which uses SGD as the optimizer. For the batch size and learning rate, I increase learning rate according to the difference between my batch size and the batch size provided in the paper. More precisely, you can just tune some parameters to boost the model performance.

Jeffrey

cwzat commented 6 years ago

@jeffreyhuang1 Your methods choosing test set is same as train set? And are you training only the top layers or all the layers?

jeffreyyihuang commented 6 years ago

@cwzat yeah, the stacked optical flow method is the same and I am training all of the layers in resnet101.

Jeffrey