Bug fix in the evaluation of batches

We regret to inform you that there was a bug in the evaluation portion of our code that evaluates the Mean Per Joint Error on test set (lines 272-274 in temporal_3d.py). We were incorrectly picking a small portion of the test set because we were indexing the data wrongly. During evaluation, we had a set of output tensor having a dimension of batch_size seqlen dim_of_3d. Since we were using sliding window, we had to pick the last sequence of each batch except for the very first batch of first set. However, instead of picking the last sequence of each batch, we were mistakenly picking up a particular batch for each set. This caused a bias in our results since a majority of the data were coming from initial part of a video. We would like to thank Lin Jiahao (jiahao.lin@u.nus.edu) for figuring out the bug and letting us know of it. We sincerely apologize for the mistake in evaluation. The actual results on Human3.6M should be around 58.5 mm instead of 51.9 mm on protocol 1 and around 44 mm instead of 42.0 mm for protocol 2. We are still in the process of repeating the experiments and will post the corrected results on arxiv.

rayat137 / Pose_3D

Bug fix in the evaluation of batches #3