Open bedman367 opened 2 years ago
update: Later I found out that there is a problem with my dataset. the ffmpeg tools on my server didn't work well ,which made the frames extracted from videos severely distorted ...... so I've learned a lesson: be sure to check that the dataset is processed correctly!!! QAQ
nt
Hello, when I tried to reproduce the results of TRX on the ssv2 dataset and ucf dataset through the code of this repository, I found that the test accuracy was lower than the SOTA results reported in the paper by more than 30%:
5-way 5-shot results( left is my results ,right is the results in the paper. all the datasets are constructed by running extract_xxx.py and shrink_xxx.py according to README.md 's description) :
dataset split---------my result---------------------------SOTA result reported in paper
ssv2_OTAM -------30.2(training for 150000 tasks)-------64.6(training for 75000 tasks) ucf_ARN-----------64.1(training for 30000 tasks)--------96.1(training for 10000 tasks)
key options as follows: { learning_rate=0.001, tasks_per_batch=16, resume_from_checkpoint=False, way=5, shot=5, query_per_class=4, query_per_class_test=5, num_val_tasks=1000, num_test_tasks=10000, seq_len=8, num_workers=8, backbone='resnet50', opt='sgd', img_size=224, num_gpus=4, method='trx', pretrained_backbone=None, val_on_test=False, trans_linear_in_dim=2048) }
The only modification I made is that I adjusted 'query_per_class' to 4 to use resnet50 on 4x2080ti without gpu memory errors.(as you've said in one issue from repository of TRX), but it seems that the convergence speed and effect of model training are far lower than the description in the paper, which confused me a lot.
I'm a newbee in deep learning and this is my first reimplementation work, so at present I have no idea about how to solve such a considerable gap from the SOTA .....
I would be very grateful if you could reply to me !