xjgaocs / Trans-SVNet

26 stars 5 forks source link

Training time #7

Open nisargshah1999 opened 2 years ago

nisargshah1999 commented 2 years ago

Hi, Thanks for helping me with previous question. Also, I was able to run the code till generate_lfb.py Later, for length of sequence = 10, while running tecno.py and transformer code, each epoch takes only 20 sec on single GPU. I am not sure, if it is observed property, and also my accuracy drops compared to just using resnet code I would be obliged if you could suggest on that Thanks

nisargshah1999 commented 2 years ago

The reason, I am wondering this, is, on my dataset,

I am getting, around 88% test phase accuracy, using resnet whereas, using tecno and (trans-svnet too), the accuracy falls down to 35% It would be great if you could suggest me on it Thanks

xjgaocs commented 2 years ago

You should re-implement TeCNO or other temporal methods with good results first to get feasible temporal embeddings for Transformer.

nisargshah1999 commented 2 years ago

ok.. cool.. thank you very much for your help Could you mention about the training time of your TECNO code, and would be great if you could mention about the training input shape of tecno part, for me, its (batch_size , 2048 , number_of_frames_videos) Not, sure, how the sequence parameter used in generate_lfb.py is inculcated here.

xjgaocs commented 2 years ago

Training both TeCNO and TransSVNet is very fast given spatial embeddings, several minutes maybe. The input TeCNO of is a whole video during training for convience, as it does not use future information. I only used generate_lfb.py to generate spatial embeddings, although it could also generate spatio-temporal embeddings.