stevenyangyj / deep-head-pose-lite

A lite-version hopenet for head pose estimation with PyTorch
Apache License 2.0
184 stars 41 forks source link

Accuracy in dim-lit conditions #13

Open Sricharan2402 opened 4 years ago

Sricharan2402 commented 4 years ago

Hello @OverEuro, Thank you for this implementation. I have a few doubts that I wanted to clarify.

  1. Does this model support head pose estimation under dim-lit conditions?

  2. I also fail to understand the need to transform the input image to 224 x 224 as done in the test code transformations = transforms.Compose([transforms.Scale(224), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) Is this done because the model was trained on images whose dimensions were in that range or is there some other reason? Also, are these transformations required while using shuff_epoch_120.pkl? If yes, are the values mentioned for normalization ( mean and standard deviation ) the same for shuff_epoch_120.pkl?

Thank you.

stevenyangyj commented 4 years ago

Hello @OverEuro, Thank you for this implementation. I have a few doubts that I wanted to clarify.

  1. Does this model support head pose estimation under dim-lit conditions?
  2. I also fail to understand the need to transform the input image to 224 x 224 as done in the test code transformations = transforms.Compose([transforms.Scale(224), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) Is this done because the model was trained on images whose dimensions were in that range or is there some other reason? Also, are these transformations required while using shuff_epoch_120.pkl? If yes, are the values mentioned for normalization ( mean and standard deviation ) the same for shuff_epoch_120.pkl?

Thank you.

Hi,

1) This model supports estimation under dim-lit, BUT you need to have datasets contains such examples (training data in dark) and re-train this model, original author's paper showed the results in dark enviroment, please refer to.

2) 224 x 224, because you should consider the structure of neural network, such as the size of filter and pooling operator. The structure determinds the shape of input image. And the mean and std are same as original project.

Sricharan2402 commented 4 years ago

@OverEuro, Thanks.

So if I were to use an image with a resolution other than 224 x 224, is it absolutely necessary for me to resize it to 224 x 224 before passing it to the model? Because I tried passing 64 x 64 images as inputs and the model still worked but the accuracy was poor as expected.

If it is necessary for the input image to be of size 224 x 224, should I just resize the face data from the image to 224 x 224 or should I include background info by loosely cropping around the face part of my image?

For eg, if this is the actual image, face

(here I've considered the boundaries returned by a face detector and cropped only the face out and then resized it to 224 x 224) f

(here I've loosely cropped around the face boundaries and then resized it to 224 x 224) fb

I tried both ways and found the results ( in terms of accuracy ) to be similar.

stevenyangyj commented 4 years ago

@OverEuro, Thanks.

So if I were to use an image with a resolution other than 224 x 224, is it absolutely necessary for me to resize it to 224 x 224 before passing it to the model? Because I tried passing 64 x 64 images as inputs and the model still worked but the accuracy was poor as expected.

If it is necessary for the input image to be of size 224 x 224, should I just resize the face data from the image to 224 x 224 or should I include background info by loosely cropping around the face part of my image?

For eg, if this is the actual image, face

(here I've considered the boundaries returned by a face detector and cropped only the face out and then resized it to 224 x 224) f

(here I've loosely cropped around the face boundaries and then resized it to 224 x 224) fb

I tried both ways and found the results ( in terms of accuracy ) to be similar.

Hi, This is the result I get. So I guess you did not run the model correctly, And sorry, I cannot tell you what's wrong, please refer to the original project's inference code you can find a link in the readme page. You actually need to resize a crop face image to 224x224. image