**Loss: inf | Acc: 0.0000**while evaluation from mpii.py

hellboywyh commented 5 years ago

checkpoint/mpii_hg_s2_b1_mobile_fpd/model_best.pth.tar是得到的kd后的模型，我用这个模型进行evaluation的时候先是使用了mpii_kd.py运行以下命令： python mpii_kd.py -a hg --stacks 2 --blocks 1 --checkpoint checkpoint/mpii_hg_s2_b1_mobile_fpd/ --resume checkpoint/mpii_hg_s2_b1_mobile_fpd/model_best.pth.tar -e 但是得到了error： mpii_kd.py: error: argument --teacher_checkpoint is required

然后试着使用mpii.py，换用命令： python mpii.py -a hg --stacks 2 --blocks 1 --checkpoint checkpoint/mpii_hg_s2_b1_mobile_fpd/ --resume checkpoint/mpii_hg_s2_b1_mobile_fpd/model_best.pth.tar -e 这次没有error，但是在evaluation过程中一直是Loss: inf | Acc: 0.0000，如下： Processing |################################| (493/493) Data: 0.268434s | Batch: 0.888s | Total: 0:07:17 | ETA: 0:00:01 | Loss: inf | Acc: 0.0000

对得到的evaluation结果我又进行了PCKh的计算，运行代码： python tools/eval_PCKh.py --matfile checkpoint/mpii_hg_s2_b1_mobile_fpd/preds_valid.mat 得到的结果是：

Model,  Head,   Shoulder, Elbow,  Wrist,   Hip ,     Knee  , Ankle ,  Mean
hg   0.00  0.00     0.00  0.00   0.00   0.00   0.02   0.00

感觉这三步应该是同一个地方出了问题，但是不知道问题在哪？

yuanyuanli85 commented 5 years ago

To eval student network, --mobile need to be set as True. Otherwise, the network created does not match the weights checkpoint/mpii_hg_s2_b1_mobile_fpd/model_best.pth.tar you passed.

Everything works for me in my side. Please check log below.

$ python example/mpii.py -a hg --stacks 2 --blocks 1 --resume checkpoint/mpii_hg_s2_b1_mobile_fpd/model_best.pth.tar -e --mobile True --checkpoint checkpoint/mpii_hg_s2_b1_mobile_fpd/
==> creating model 'hg', stacks=2, blocks=1
=> loading checkpoint 'checkpoint/mpii_hg_s2_b1_mobile_fpd/model_best.pth.tar'
=> loaded checkpoint 'checkpoint/mpii_hg_s2_b1_mobile_fpd/model_best.pth.tar' (epoch 90)
    Total params: 2.31M
    Mean: 0.4404, 0.4440, 0.4327
    Std:  0.2458, 0.2410, 0.2468

Evaluation only
example/mpii.py:229: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  input_var = torch.autograd.Variable(inputs.cuda(), volatile=True)
example/mpii.py:230: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  target_var = torch.autograd.Variable(target, volatile=True)
Processing |################################| (493/493) Data: 0.618189s | Batch: 0.685s | Total: 0:05:37 | ETA: 0:00:01 | Loss: 0.0008 | Acc:  0.8228

hellboywyh commented 5 years ago

Problem solved by your methods! Thank you! I missed the --mobile.

hellboywyh commented 5 years ago

I ran the evaluation again and got this, the acc is pretty low and obviously incorrect.

python mpii.py -a hg --stacks 2 --blocks 1 --checkpoint checkpoint/hg_s2_b1_mobile_fpd/ --resume checkpoint/hg_s2_b1_mobile_fpd/model_best.pth.tar -e --mobile True
==> creating model 'hg', stacks=2, blocks=1
=> no checkpoint found at 'checkpoint/hg_s2_b1_mobile_fpd/model_best.pth.tar'
    Total params: 2.31M
    Mean: 0.4404, 0.4440, 0.4327
    Std:  0.2458, 0.2410, 0.2468

Evaluation only
mpii.py:232: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  input_var = torch.autograd.Variable(inputs.cuda(), volatile=True)
mpii.py:233: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  target_var = torch.autograd.Variable(target, volatile=True)
Processing |################################| (493/493) Data: 0.248658s | Batch: 1.010s | Total: 0:08:17 | ETA: 0:00:02 | Loss: 0.0305 | Acc:  0.0060

Then I check the number of labels in data/mpii/mpii_annotations.jsonand find it's 25204. However,the number in mpii_human_pose_v1_u12_1.mat is 24987. So have you add your data in the dataset and is it the reason of the acc error?

Looking forward for your help~

yuanyuanli85 commented 5 years ago

no, don't do anything to add new data into mpii. From your log, looks like the ckpt not loaded correctly.

=> no checkpoint found at 'checkpoint/hg_s2_b1_mobile_fpd/model_best.pth.tar'

hellboywyh commented 5 years ago

Yes, you are right! Thank you so much! But I am still confused with the images' number……but it doesn't matter now……

yuanyuanli85 / Fast_Human_Pose_Estimation_Pytorch

Loss: inf | Acc: 0.0000while evaluation from mpii.py #11

yuanyuanli85 / Fast_Human_Pose_Estimation_Pytorch

**Loss: inf | Acc: 0.0000**while evaluation from mpii.py #11

Loss: inf | Acc: 0.0000while evaluation from mpii.py #11