r1c7 / SlowFastNetworks

PyTorch implementation of "SlowFast Networks for Video Recognition".
339 stars 80 forks source link

Non-degenerate temporal convolutions in slow path #4

Closed alicranck closed 5 years ago

alicranck commented 5 years ago

Hi,

Great work with implementing the paper here. I'm trying to replicate the results on Kinetics-400 and so far it looks really promising!

I wanted to ask about the temporal convolutions in the slow path of the model - In the paper they apply non-degenerate temporal convolutions in residual layers 4 and 5 of the slow path (kernel size>1 for depth dimension). Is it something you chose not to apply here? did it hurt performance somehow? I just want to know whether it's something worth attempting.

Thanks!

anjingxing commented 5 years ago

@alicranck @RI-CH hello,As a beginner, can you tell me how to place my data set. Thanks!

alicranck commented 5 years ago

HI @anjingxing, I'll be happy to explain. Could you open a new issue for this?

r1c7 commented 5 years ago

@alicranck You are right. Thanks for pointing this mistake, please check the latest commit. Can you list the accuracy you test on both train dataset and validation dataset?

r1c7 commented 5 years ago

@anjingxing Please follow readme. The subfolders in train/val folder are action categories. Also you need to modify the dataset path in the config.py.

alicranck commented 5 years ago

@RI-CH Thanks! I'll try with the latest updates. Currently I'm getting ~70% top-5 and ~40% top-1 accuracy after 20 epochs (validation is about 5% lower for both). While it's not quite the performance in the paper, the training progresses well. I'll update in the future if I can indeed replicate the results.

AaronMaYue commented 5 years ago

@alicranck How is your learning schedule?the paper said lr = η · 0.5[cos( n / n_max * π) + 1]

alicranck commented 5 years ago

@AaronMaYue I kept the stepRL schedule in the code, but increased the scaledown parameter to 0.6 instead of 0.1, and increased epoch number per step to 20, since kinetics is much larger than UCF101

AaronMaYue commented 5 years ago

@alicranck Here are my lr schedule for my own datasets. lambda_1 = lambda step : 0.5 * params['learning_rate'] * ((np.cos(step / max_step * np.pi)) + 1) scheduler1 = optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=[lambda_1])

and get high acc for training but validation is terriable, maybe my datasets' problem. I don't have Kinetics-400 dataset because it's too big. You can try, and tell us the news.

AaronMaYue commented 5 years ago

@alicranck Hi alicranck, I used lr = η · 0.5[cos( n / n_max * π) + 1] learning rate schedule for UCF(1080Ti, crop_size=112, epoch_num=50, batch_size=64, clip_len=32, init_lr=0.01) got acc 96%, loss 0.12 for training, acc 92%, loss 0.33 for validation. maybe clip_len=64 better then clip_len=32. max_step = (len(train_videos) + num_epoch) // batch_size

alicranck commented 5 years ago

@AaronMaYue Thanks! I will let you know. It takes a while to train since the kinetics is so big...

dannyhung1128 commented 5 years ago

Hi, I'd like to jump into video recognition problem. May I please know the average GPU cost of such task? Like what's the minimum number of GPUs that's needed to train a SlowFastNetwork? Thanks a lot!

alicranck commented 5 years ago

Hi @dannyhung1128 , I guess it really depends on the size of the dataset you want to work with. At the minimum I would say you need one strong GPU with at least 8GB to run the model with reasonable batch sizes. This may still take days or even weeks to train depending again on the size of the dataset.

dannyhung1128 commented 5 years ago

Thanks for the quick reply @alicranck . My dataset is for a 9-categorical classification task which has ~3400 clips with each having around 64 frames. I'll start working w SlowFastNetwork recently. Thanks a lot!

dannyhung1128 commented 5 years ago

Hi @AaronMaYue, @RI-CH and @alicranck, it's nice to see you running experiments on UCF-101. Previosly in https://github.com/RI-CH/SlowFastNetworks/issues/1 @RI-CH stated that this network can only reach ~42% accuracy on validation, but in @AaronMaYue's case it's ~92%. Is this difference due to how you split your train/val sets? Or there's some debugging happened between the periods? Thanks in advance

AaronMaYue commented 5 years ago

@dannyhung1128 my input shape=(3, 32, 112,112), change short_stride=[120,150] in dataset.py. Create a training_list.txt and test_list.txt, randomly select 80% data in the label folder, and the rest is testing data. That means there are 10619 videos, max_step = (10619 * num_epoch)//batch_size. Try learning rate schedule as mentioned above. and my learning rate is 0.01.

dannyhung1128 commented 5 years ago

@AaronMaYue Thanks for the reply. I follow the same recipe as yours except the short_stride wasn't changed. As for dataset split, I followed UCF-101's official train test split here (Use trainlist01.txt and testlist01.txt). My batch_size was 32. I'll change the short_side, and update once I have the result. Thanks

r1c7 commented 5 years ago

@AaronMaYue I know the reason why you get 92% accuracy in validation dataset. Because your validation dataset is randomly selected, while I use the UCF101 (split 1) as the validation dataset. I suggest use the split 1 to divide the train and validation dataset as most papers did, such as[1,2,3]. Some video clips in UCF101 are captured by the same camera. @dannyhung1128

[1] Two-Stream Convolutional Networks for Action Recognition in Videos [2] ConvNet Architecture Search for Spatiotemporal Feature Learning [3] Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification

AaronMaYue commented 5 years ago

@RI-CH @dannyhung1128 It does have the same camera's problem, I will follow your training list later. what is your train_acc? My train_acc 96%, It really surprised me.

r1c7 commented 5 years ago

@AaronMaYue The video clip name in UCF101 is organized as v_categories_gXX_cYY.avi. For example, v_ApplyLipstick_g08_c01.avi, v_ApplyLipstick_g08_c02.avi,v_ApplyLipstick_g08_c03.avi... These video clips with same string 'v_ApplyLipstick_g08' are captured by same camera and these clips are quite similar. So this is the reason why the validation selected randomly is high. It is not appropriate and convincing.

95xueqian commented 5 years ago

I have a question ,Is Lateral Connections implemented? @r1ch88

r1c7 commented 5 years ago

@95xueqian Yes, lateral connections are added in this repo.

julycetc commented 5 years ago

@r1ch88 Do you have model pretrained on kinetics dataset?Thanks very much!

JJBOY commented 5 years ago

@dannyhung1128 @AaronMaYue @alicranck Hi, I am happy to find you are testing slowfast on UCF101. So,how's your lastest results? I followed the offcial split.I use the split 1 to train and test. But the results are always not so good. train acc@1: 96.5% acc@10: 99.7% validation acc@1: 55.4% acc@10: 79.0% It seems that if overfitting the dataset.

For more details ,you can see my implemeration code at https://github.com/JJBOY/SlowFast-Network.

ilovekj commented 5 years ago

@AaronMaYue Thanks for the reply. I follow the same recipe as yours except the short_stride wasn't changed. As for dataset split, I followed UCF-101's official train test split here (Use trainlist01.txt and testlist01.txt). My batch_size was 32. I'll change the short_side, and update once I have the result. Thanks

hello,did you solve the problem? my result is @1:0.72, @5:0.92

r1c7 commented 5 years ago

@JJBOY @ilovekj Yes, it is easily overfitting on UCF101. #1

lxgyChen commented 5 years ago

my result, split 1: validation acc@1:61% acc@5: 80.3% Rethinking pre-training.....

gnefihs commented 4 years ago

Hi @alicranck This isn't related to the topic, but what does "non-degenerate" mean is this context?

Any links/resources you can point direct me to would be great. Thanks :)

A1014280203 commented 3 years ago

Hi @alicranck This isn't related to the topic, but what does "non-degenerate" mean is this context?

Any links/resources you can point direct me to would be great. Thanks :)

stride=1, I think.

deepNet-Chirag commented 3 years ago

Hi @alicranck This isn't related to the topic, but what does "non-degenerate" mean is this context?

Any links/resources you can point direct me to would be great. Thanks :)

Same question