gsig / actor-observer

ActorObserverNet code in PyTorch from "Actor and Observer: Joint Modeling of First and Third-Person Videos", CVPR 2018
GNU General Public License v3.0
76 stars 9 forks source link

How to get the fine-tune baseline results? #5

Open Yu-Wu opened 5 years ago

Yu-Wu commented 5 years ago

Hi, how to fine-tune the model from the third-view dataset on the first-view dataset?

I tried to directly fine-tune the trained model from https://github.com/gsig/charades-algorithms by using this script https://github.com/gsig/actor-observer/blob/master/exp/baseline_resnet152imagenet.py.

However, the results is only 22, which is the same with the model fine-tuned from ImageNet, which indicates the third-view pre-training has no effect on the first-view performance. Could you please indicate how to correctly fine-tune the model trained from the third-view dataset?

Thanks.

gsig commented 5 years ago

The script you linked is the baseline for using a ImageNet (or charades, in your case) model for the first-to-third-person tasks, and originally it was even just used directly in the evaluation phase, changing this to do supervised classification might be nontrivial because I the dataloader might not have that information?

The easiest way of fine-tuning a third-person model on the first-person dataset would be to directly use the https://github.com/gsig/charades-algorithms codebase (or the new https://github.com/gsig/PyVideoResearch codebase) and simply replace the Charadesv1{train,test}.csv files with CharadesEgov1{train,test}.csv (removing all third person videos thought). PyVideoResearch also has a dedicated dataloader that does removal for you: https://github.com/gsig/PyVideoResearch/blob/master/datasets/charades_ego_only_first.py

Hope that helps!