wbenbihi / hourglasstensorflow

Tensorflow implementation of Stacked Hourglass Networks for Human Pose Estimation
MIT License
479 stars 177 forks source link

About loss. #19

Closed danache closed 2 years ago

danache commented 6 years ago

Hi,thanks to your awsome job! I am trying to implement Hourglass model in tensorflow.And I notice the loss you used is "tf.nn.sigmoid_cross_entropy_with_logits".But in the paper the author use MSE instead.I trying to replace it with tf.losses.mean_squared_error but the net will not convergent as fast as the origin one and the result become awful.Have ever tried to use the MSE loss?And could you tell me why do you use CE loss instead?Thank you !

wbenbihi commented 6 years ago

Hi, Thank you for the comment, I hoped somebody will notice it haha.

I started this project by using the exact same configuration as in the paper, so the Mean Squared Error loss function. My first results were just awful, an incredible rate of false positive, then overfitting, then underfitting ... Changing the loss function gave miraculous improvments, it was not the only modification but it has its importance.

I think I might know why the MSE loss was not a good fit to my model. Given one person on a scene (256x256 RGB Img) the output tensor is a 1x64x64x16-d array (65536-d). On the ground truth array the huge majority of neurons are set to 0 (obviously because they do not represent a body part). We can assume that 99 % of the neurons in the ground truth output tensor are 0, other neurons are set in ]0 ,1]. This part was for the context.

So now we use a MSE loss, which is supposed -given an optimizer- to reduce the L2 distance between the predicted output and the ground truth output (diminush the reconstruction error basically). If you had to guess the weights manually, a good heuristic would be to set them all to 0. Why? Because statistically, the prediction would always be a 0 array, so your maximum error would be around 1% corresponding to the non-zero neurons on the ground truth. To rephrase, the MSE loss will consider each neurons with equal importance on the prediction. And it may be a good fit when you are doing basic classification or regression, but here the dimensionnality is too high.

This brings us to the CE loss: formulabce It is not exactly the same function but the principle is the same.

On these type of functions, the ground truth will put the emphasis on areas to consider for backpropagation. p: ground truth; q: prediction Ex: if the ground truth of a neuron is zero, the ground truth term p that weight the log(q) will cancel the backpropagation of this neuron (Gradient set to zero here)

By doing so, the convolution kernels are only trained to find body joints, they do not care about classifying neurons as 'not body part' .

Well, this is an explanation I worked on for some time, but I don't have any way to justify it mathematically (in term of convergence I mean). So far, it is the best answer you'll get from me ;)

Crazod commented 6 years ago

Hi, I have use the l2_loss and CE loss: Both of them can get similar result.(90%~) However, i notice that most of the dark region (no response region)in l2_loss will close to 0. But when use sigmoid CE loss, it will close to -10. If you need my pretained model. I can send it to you. @danache @wbenbihi

dongzhuoyao commented 6 years ago

@wbenbihi, I think that just coincidence, your explanation could not stand up mathematically.

by the way could you release your evaluation code? 😃

Crazod commented 6 years ago

Hi recently i found a paper. https://arxiv.org/abs/1703.00862 Binarized Convolutional Landmark Localizers for Human Pose Estimation and Face Alignment with Limited Resources. It also use cse and present some explainment.

xiaoxin05 commented 3 years ago

hi, I meet the same problem, Can you share the related code of MSE loss? Thank you @dongzhuoyao @Crazod

xiaoxin05 commented 3 years ago

I tried to use the sigmoid cross entropy set by this project in the open data set MPII, but I found that there were some problems: I find it incredible that the loss can be reduced to 0.01 within 1 epoch,I guess this loss function need correct?

Crazod commented 3 years ago

hi, I meet the same problem, Can you share the related code of MSE loss? Thank you @dongzhuoyao @Crazod

Sorry, this project is too old to find, loss may need uniform by W*H when you use.

xiaoxin05 commented 3 years ago

HI,I wrote a copy myself, can you check if it's correct

def JointMSELoss(self,out_single,groudtruth_heatmap,target): target_mask = tf.reshape(target,[16,1,1,1,16])  #batchsize为16, 16为关键点个数 loss_mse = tf.reduce_mean(tf.square(out_single - groudtruth_heatmap) * target_mask) return loss_mse

loss = 0 for n in range(3):  # 仅针对三阶段堆叠沙漏网络模型,其他阶段的需要修改此处的变量”3“      groudtruth_heatmap = self.gtMaps[:, n, :, :, :]  # 单阶段堆叠沙漏网络热图真值      out_single = self.output[:, n, :, :, :]      target = self.weights      # 下面就开始进行loss的计算      print("target:", target, np.array(target).shape)      loss_compute = self.JointMSELoss(out_single, groudtruth_heatmap, target)

     loss += loss_compute      self.loss = loss

Excuse me, there is another problem, I used the Sigmoid Loss of the original project:            self.loss tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=self.output, labels= self.gtMaps), name= 'cross or the aboved loss   The abnormal phenomenon that loss drops rapidly will occur. I printed out the loss of each batch when the epoch is 1, and found that it can drop to 0.01. I think there is something wrong with this.

Thanks. ------------------ 原始邮件 ------------------ 发件人: "wbenbihi/hourglasstensorlfow" @.>; 发送时间: 2021年4月16日(星期五) 上午10:31 @.>; @.**@.>; 主题: Re: [wbenbihi/hourglasstensorlfow] About loss. (#19)

hi, I meet the same problem, Can you share the related code of MSE loss? Thank you @dongzhuoyao @Crazod

Sorry, this project is too old to find, loss may need uniform by W*H when you use.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.