jperezrua / mfas

Implementation of CVPR 2019 paper "Mfas: Multimodal fusion architecture search"
77 stars 20 forks source link

Skeleton Net Low Accuracy #1

Closed haamoon closed 4 years ago

haamoon commented 4 years ago

I am trying to reproduce the unimodal and multimodal results reported in the paper. I got following accuracies by running the scripts provided in this repo:

best_3_1_1_1_3_0_1_1_1_3_3_00.9134.checkpoint: 90.03% conf[[3_00][1_30][1_11][3_3_0]]_both_0.896888457572633.checkpoint: 88.64%

As you see, the results reasonable (still about 1% less than the numbers you got) which implies that I have setup the dataset correctly.

On the other hand, I get very different results from Skeleton unimodal net. I used the provided pre-trained checkpoints for each modality and loaded them into models.central.Visual and models.central.Skeleton modules. I wrote a simple script to forward and compute the accuracy of these modules. The result (especially for skeleton net) are very different from the paper

skeleton_32frames_85.24.checkpoint: 48.02% rgb_8frames_83.91.checkpoint: 85.23%

Do you have any idea what I am doing wrong here? I would appreciate your comment.

jperezrua commented 4 years ago

Hi @haamoon, in my opinion there is no reason for the score difference. I suppose you wrote your own dataloader? Can you double check that the data is being preprocessed in the same way we use in our own dataloader?

I was chatting with Valentin, my co-author, and he thinks you should be especially careful with the skeleton data normalization, eg: here