gsig / actor-observer

ActorObserverNet code in PyTorch from "Actor and Observer: Joint Modeling of First and Third-Person Videos", CVPR 2018
GNU General Public License v3.0
76 stars 9 forks source link

What's the version of dataset? #7

Open vana77 opened 5 years ago

vana77 commented 5 years ago

Hi, I find that you use version0 under the folder datasets/labels, but when I download the CharadesEgo dataset I get the label of version1. What version do you use to get the result of the paper? Thanks.

gsig commented 5 years ago

Good question.

I looked into it and there seems to be a mistake in what version of the dataset was used where. This seems to stem from the fact that the egocentric test data was a separate parameter in the code such that when the new dataset (v1) was ready, and we reran all the methods for the camera ready, it was likely still being evaluated on the v0 version. This affects the "transfer learning" results (Table 3 in ActorObserver paper) and "egocentric baselines" results (Table 2 in CharadesEgo paper).

I'll try to outline what I've discovered below and how it will be clarified. However. Charades_v1 (the one on the website) should be used everywhere from now on, and any discrepancy with prior work noted where applicable.

Analysis:

Actor and Observer: Joint Modeling of First and Third-Person Videos

It looks like the numbers in Table 3 were run on Charades_v0. Rerunning these experiments on the following combinations of training and evaluation data I get:

What this means: If comparing with the ActorObserver paper on CharadesEgo_v1 the 25.9% number in Table 3 is invalid, because it uses Charades_v0 for evaluation.

How it will be fixed: I'll recalculate the columns of Table 3 for Charades_v1 and release an Errata on the project webpage https://github.com/gsig/actor-observer

Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos

In Table 2 some of the baselines are using Charades_v0.

How it will be fixed: I'll release a new version of the Arxiv paper with Table 2 fixed.

Difference between Charades_v1 and Charades_v0

The most puzzling/unexpected thing about this is the difference between training and testing on different versions of the dataset. I did some preliminary analysis on the datasets to try to explain the difference in performance, but nothing seems to explain it:

So in conclusion, Charades_v0 seems to have had a particularly "easy" train/test split for some reason, and it is not clear to me why that is, other than just random chance.

Moving forward, it should be sufficient to explain any discrepancy between your work and prior work by referring the the Errata in this respository or the updated Arxiv paper.

I'll keep posting updates about the process of clarifying this. Let me know if there is anything I can do to help, or if you have any questions. Also, if you (or anyone else) have any observations or insight into this, definitely let us know.

Best, Gunnar

vana77 commented 5 years ago

Thank you very much for your detailed reply.

lyttonhao commented 4 years ago

I am wondering if you already have the updated results for the corresponding tables somewhere? Thanks!