una-dinosauria / 3d-pose-baseline

A simple baseline for 3d human pose estimation in tensorflow. Presented at ICCV 17.
MIT License
1.42k stars 354 forks source link

hi, I have a question about your method's normalization #35

Closed bsdj2 closed 6 years ago

bsdj2 commented 6 years ago

I have read your code and paper carefully. But there are still some puzzles about your data's normalization.

My understanding about your normalization is: respectively calculate mean and stdev for each kind of joint at first, then subtracting each kind of joint's mean and dividing by their own stdev. And this operating will result in that all the joint will be pulled near the coordinate origin. I think this will lose the relative topological information about joint. But its result is good, so I can't understand why it can work and after doing a normalization for joint respectively, what the result data means?

Could you solve the puzzles for me? Best wishes for you

una-dinosauria commented 6 years ago

Hi @bsdj2,

My understanding about your normalization is: respectively calculate mean and stdev for each kind of joint at first, then subtracting each kind of joint's mean and dividing by their own stdev. And this operating will result in that all the joint will be pulled near the coordinate origin.

Subtracting the mean and dividing by the standard deviation is a standard procedure in most machine learning applications. In a way, it ensures that your data is zero-centred, and that it's standard deviation is about 1.

This is useful for many reasons. For once, most initialization schemes samples values between one and zero, so these initializations can be used independently of what your data originally looks like. Another reason is that, in some datasets, some dimensions of your input might have very large values, while others might be tiny, but you probably want your model to treat them as equally important.

think this will lose the relative topological information about joint. But its result is good, so I can't understand why it can work and after doing a normalization for joint respectively

After the model predicts a 3d pose, we simply denormaize the prediction (multiply times stdev, and add the mean), which brings it back to the original coordinates of the human body. That's why the results look good anyway.

Hope that helps!

bsdj2 commented 6 years ago

Thanks for your reply patiently.