lucidrains / TimeSformer-pytorch

Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
MIT License
686 stars 86 forks source link

TimSformer in TF, thanks #13

Open junyongyou opened 3 years ago

junyongyou commented 3 years ago

Is anybody willing to implement TimSformer in Tensorfor2.+? I am trying to do that, but is struggling ...

lucidrains commented 3 years ago

@junyongyou hmm, i'm never doing tensorflow

junyongyou commented 3 years ago

@junyongyou hmm, i'm never doing tensorflow

Aha, I know that. So I am just seeing there might be somebody else :).

slimaneaymen commented 3 years ago

Hi everybody !!!
I am trying to implement TimeSFormer for VideoClassification using as input the feature maps of a CNN, my data have the shape (4,50,1,1,256) where: mini_batch=4 / frames=50 / channels=1 / H=1 / W= 256 The parameters of the TimeSformer are : TimeSformer( dim = 128, image_size = 256, patch_size = 16, num_frames = 50, num_classes = 2, depth = 12, heads = 8, dim_head = 32, attn_dropout = 0., ff_dropout = 0. ) In order to check if my network is working, I have tried to make it overfit by using only 6 training data and 2 validation data of the same shape as before (4,50,1,1,256). But the accuracy I'm getting is in oscillation and never reaches a value >80% and my training loss is not decreasing it's always around 0.6900 - 06900 My training function and parameters are: Capture1 Capture2 Capture3 image

I have also tried to train the modal on Frames of images instead of Feature map data, with an input of the shape (4,50,3,224,224) where: mini_batch=4 / frames=50 / channels=3 / H=224 / W= 224 But Unfortunately, I am getting the same results.

I would appreciate any suggestion. thank you

junyongyou commented 3 years ago

Hi, I didn't think into your question carefully. However, I have some feelings that either your input shape (H=1) and/or such small number of training/val samples might be questionable.

slimaneaymen commented 3 years ago

Hi, @junyongyou, concerning H, I have even tried with H=224/W=224 concerning the number of training/val I also have tried with large numbers ( 420) but still giving the same results

junyongyou commented 3 years ago

Hi, @junyongyou, concerning H, I have even tried with H=224/W=224 concerning the number of training/val I also have tried with large numbers ( 420) but still giving the same results

Sorry, I don't know the. Maybe you need to check your data first. From the screenshot, your train loss didn't reduce at all. I have tried the model in my experiment (not image recognition), it didn't give me very good performance but it indeed does something.

slimaneaymen commented 3 years ago

Please, could you explain more what do you mean by checking the data? Regarding your experiment, could you tell me what was your hyperparameters for the training, like (which loss function, Lr,..) Also, do you think my calculation of the accuracy (the first figure) was right? Thank you