tobyperrett / few-shot-action-recognition

Implementations of some few-shot action recognition methods.
MIT License
41 stars 6 forks source link

questions about reimplementation of TRX #4

Open bedman367 opened 2 years ago

bedman367 commented 2 years ago

Hello, when I tried to reproduce the results of TRX on the ssv2 dataset and ucf dataset through the code of this repository, I found that the test accuracy was lower than the SOTA results reported in the paper by more than 30%:

5-way 5-shot results( left is my results ,right is the results in the paper. all the datasets are constructed by running extract_xxx.py and shrink_xxx.py according to README.md 's description) :

dataset split---------my result---------------------------SOTA result reported in paper

ssv2_OTAM -------30.2(training for 150000 tasks)-------64.6(training for 75000 tasks) ucf_ARN-----------64.1(training for 30000 tasks)--------96.1(training for 10000 tasks)

key options as follows: { learning_rate=0.001, tasks_per_batch=16, resume_from_checkpoint=False, way=5, shot=5, query_per_class=4, query_per_class_test=5, num_val_tasks=1000, num_test_tasks=10000, seq_len=8, num_workers=8, backbone='resnet50', opt='sgd', img_size=224, num_gpus=4, method='trx', pretrained_backbone=None, val_on_test=False, trans_linear_in_dim=2048) }

The only modification I made is that I adjusted 'query_per_class' to 4 to use resnet50 on 4x2080ti without gpu memory errors.(as you've said in one issue from repository of TRX), but it seems that the convergence speed and effect of model training are far lower than the description in the paper, which confused me a lot.

I'm a newbee in deep learning and this is my first reimplementation work, so at present I ​have no idea about how to solve such a considerable gap from the SOTA .....

I would be very grateful if you could reply to me !

bedman367 commented 2 years ago

update: Later I found out that there is a problem with my dataset. the ffmpeg tools on my server didn't work well ,which made the frames extracted from videos severely distorted ...... so I've learned a lesson: be sure to check that the dataset is processed correctly!!! QAQ

ycwfs commented 12 months ago

nt