michalfaber / keras_Realtime_Multi-Person_Pose_Estimation

Keras version of Realtime Multi-Person Pose Estimation project
Other
779 stars 372 forks source link

Critical bug in generate hdf5 #41

Open anatolix opened 6 years ago

anatolix commented 6 years ago

I've found critical bug with generate_hdf5.

the idea is the following: in case of several persons genLMDB puts one picture for each person, and augmentation centers and resize picture on him. Generate_hdf5 just puts one main person. That is the reason of difference in number of pictures, we have 50k, LMDB has 120k pictures.

This is fixed in my fork of project. Training results will be available in 2-3 days.

anatolix commented 6 years ago

After fixing this I've got perfect ski.jpg in same amount of training epochs as original model python_new_aug 036 nb: Loss became significally larger, but this is due to overall change of training data.

ildoonet commented 6 years ago

@anatolix so, what did you change? Could you elaborate it easily, so that I can understand?

anatolix commented 6 years ago

The idea is the following: in our database we had 50k images in original work 120k+ images, it could be calculated by original work training log. I noticed the difference downloaded original LMDB database and noticed lot of images repeated here.

Difference is caused by the following: in augmentation we could have several persons. Some of the could be "main person" some of them couldn't (to close to other main person or to low joints visible)

In original work augmentation we use all "main persons", by centering image on each person, and generate several images. This fork for only use 1st one to generate image. This is a change #1

Change #2 is following this work only put joints for "main persons"(all - 1st main person and could-be-main persons), but all persons which couldn't be main person are filtered out completely. Original work leaves all joints i.e. if some person couldn't be main picture will not be centered around them but all joints will be feed to NN.

I've think change #2 actually more important in than #1, because in current project training NN never sees person with joints too close(i.e. in picture above guy behind the girl will be filtered out, this should confuse network)

In my fork I have code which could read LMDB file - you could look there for proof, another method is to read matlab code GenerateMLDB from original project.

kevinlin311tw commented 6 years ago

@anatolix Thank you for this awesome finding. I followed your suggestions and trained the model on COCO train2014 dataset. I strictly followed COCO eval protocol and successfully reproduced similar performance on COCO val2014-1k dataset comparing to the original paper!

kevinlin311tw commented 6 years ago

From my understanding, center-cropping each person removes the background noise, some of the occlusions, and helps the network concentrate more on learning the keypoints for each person. Please correct me if I am wrong.

mingloo commented 6 years ago

@anatolix @kevinlin311tw How much iterations do you trained on your side to obtain the comparable results with original paper eventually?

anatolix commented 6 years ago

In my fork I changed the iteration size to match original paper, to scale learning rate appropriately, and train exactly same number of iterations. On this fork I was unable to get good results even after 100 iterations, check this issue https://github.com/michalfaber/keras_Realtime_Multi-Person_Pose_Estimation/issues/39