Closed alicranck closed 5 years ago
@alicranck @RI-CH hello,As a beginner, can you tell me how to place my data set. Thanks!
HI @anjingxing, I'll be happy to explain. Could you open a new issue for this?
@alicranck You are right. Thanks for pointing this mistake, please check the latest commit. Can you list the accuracy you test on both train dataset and validation dataset?
@anjingxing Please follow readme. The subfolders in train/val folder are action categories. Also you need to modify the dataset path in the config.py.
@RI-CH Thanks! I'll try with the latest updates. Currently I'm getting ~70% top-5 and ~40% top-1 accuracy after 20 epochs (validation is about 5% lower for both). While it's not quite the performance in the paper, the training progresses well. I'll update in the future if I can indeed replicate the results.
@alicranck How is your learning schedule?the paper said lr = η · 0.5[cos( n / n_max * π) + 1]
@AaronMaYue I kept the stepRL schedule in the code, but increased the scaledown parameter to 0.6 instead of 0.1, and increased epoch number per step to 20, since kinetics is much larger than UCF101
@alicranck Here are my lr schedule for my own datasets.
lambda_1 = lambda step : 0.5 * params['learning_rate'] * ((np.cos(step / max_step * np.pi)) + 1)
scheduler1 = optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=[lambda_1])
and get high acc for training but validation is terriable, maybe my datasets' problem. I don't have Kinetics-400 dataset because it's too big. You can try, and tell us the news.
@alicranck Hi alicranck, I used lr = η · 0.5[cos( n / n_max * π) + 1] learning rate schedule for UCF(1080Ti, crop_size=112, epoch_num=50, batch_size=64, clip_len=32, init_lr=0.01) got acc 96%, loss 0.12 for training, acc 92%, loss 0.33 for validation. maybe clip_len=64 better then clip_len=32.
max_step = (len(train_videos) + num_epoch) // batch_size
@AaronMaYue Thanks! I will let you know. It takes a while to train since the kinetics is so big...
Hi, I'd like to jump into video recognition problem. May I please know the average GPU cost of such task? Like what's the minimum number of GPUs that's needed to train a SlowFastNetwork? Thanks a lot!
Hi @dannyhung1128 , I guess it really depends on the size of the dataset you want to work with. At the minimum I would say you need one strong GPU with at least 8GB to run the model with reasonable batch sizes. This may still take days or even weeks to train depending again on the size of the dataset.
Thanks for the quick reply @alicranck . My dataset is for a 9-categorical classification task which has ~3400 clips with each having around 64 frames. I'll start working w SlowFastNetwork recently. Thanks a lot!
Hi @AaronMaYue, @RI-CH and @alicranck, it's nice to see you running experiments on UCF-101. Previosly in https://github.com/RI-CH/SlowFastNetworks/issues/1 @RI-CH stated that this network can only reach ~42% accuracy on validation, but in @AaronMaYue's case it's ~92%. Is this difference due to how you split your train/val sets? Or there's some debugging happened between the periods? Thanks in advance
@dannyhung1128 my input shape=(3, 32, 112,112), change short_stride=[120,150] in dataset.py. Create a training_list.txt and test_list.txt, randomly select 80% data in the label folder, and the rest is testing data. That means there are 10619 videos, max_step = (10619 * num_epoch)//batch_size. Try learning rate schedule as mentioned above. and my learning rate is 0.01.
@AaronMaYue Thanks for the reply. I follow the same recipe as yours except the short_stride wasn't changed. As for dataset split, I followed UCF-101's official train test split here (Use trainlist01.txt and testlist01.txt). My batch_size was 32. I'll change the short_side, and update once I have the result. Thanks
@AaronMaYue I know the reason why you get 92% accuracy in validation dataset. Because your validation dataset is randomly selected, while I use the UCF101 (split 1) as the validation dataset. I suggest use the split 1 to divide the train and validation dataset as most papers did, such as[1,2,3]. Some video clips in UCF101 are captured by the same camera. @dannyhung1128
[1] Two-Stream Convolutional Networks for Action Recognition in Videos [2] ConvNet Architecture Search for Spatiotemporal Feature Learning [3] Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
@RI-CH @dannyhung1128 It does have the same camera's problem, I will follow your training list later. what is your train_acc? My train_acc 96%, It really surprised me.
@AaronMaYue The video clip name in UCF101 is organized as v_categories_gXX_cYY.avi. For example, v_ApplyLipstick_g08_c01.avi, v_ApplyLipstick_g08_c02.avi,v_ApplyLipstick_g08_c03.avi... These video clips with same string 'v_ApplyLipstick_g08' are captured by same camera and these clips are quite similar. So this is the reason why the validation selected randomly is high. It is not appropriate and convincing.
I have a question ,Is Lateral Connections implemented? @r1ch88
@95xueqian Yes, lateral connections are added in this repo.
@r1ch88 Do you have model pretrained on kinetics dataset?Thanks very much!
@dannyhung1128 @AaronMaYue @alicranck Hi, I am happy to find you are testing slowfast on UCF101. So,how's your lastest results? I followed the offcial split.I use the split 1 to train and test. But the results are always not so good. train acc@1: 96.5% acc@10: 99.7% validation acc@1: 55.4% acc@10: 79.0% It seems that if overfitting the dataset.
For more details ,you can see my implemeration code at https://github.com/JJBOY/SlowFast-Network.
@AaronMaYue Thanks for the reply. I follow the same recipe as yours except the short_stride wasn't changed. As for dataset split, I followed UCF-101's official train test split here (Use trainlist01.txt and testlist01.txt). My batch_size was 32. I'll change the short_side, and update once I have the result. Thanks
hello,did you solve the problem? my result is @1:0.72, @5:0.92
@JJBOY @ilovekj Yes, it is easily overfitting on UCF101. #1
my result, split 1: validation acc@1:61% acc@5: 80.3% Rethinking pre-training.....
Hi @alicranck This isn't related to the topic, but what does "non-degenerate" mean is this context?
Any links/resources you can point direct me to would be great. Thanks :)
Hi @alicranck This isn't related to the topic, but what does "non-degenerate" mean is this context?
Any links/resources you can point direct me to would be great. Thanks :)
stride=1, I think.
Hi @alicranck This isn't related to the topic, but what does "non-degenerate" mean is this context?
Any links/resources you can point direct me to would be great. Thanks :)
Same question
Hi,
Great work with implementing the paper here. I'm trying to replicate the results on Kinetics-400 and so far it looks really promising!
I wanted to ask about the temporal convolutions in the slow path of the model - In the paper they apply non-degenerate temporal convolutions in residual layers 4 and 5 of the slow path (kernel size>1 for depth dimension). Is it something you chose not to apply here? did it hurt performance somehow? I just want to know whether it's something worth attempting.
Thanks!