Open anatolix opened 6 years ago
After fixing this I've got perfect ski.jpg in same amount of training epochs as original model nb: Loss became significally larger, but this is due to overall change of training data.
@anatolix so, what did you change? Could you elaborate it easily, so that I can understand?
The idea is the following: in our database we had 50k images in original work 120k+ images, it could be calculated by original work training log. I noticed the difference downloaded original LMDB database and noticed lot of images repeated here.
Difference is caused by the following: in augmentation we could have several persons. Some of the could be "main person" some of them couldn't (to close to other main person or to low joints visible)
In original work augmentation we use all "main persons", by centering image on each person, and generate several images. This fork for only use 1st one to generate image. This is a change #1
Change #2 is following this work only put joints for "main persons"(all - 1st main person and could-be-main persons), but all persons which couldn't be main person are filtered out completely. Original work leaves all joints i.e. if some person couldn't be main picture will not be centered around them but all joints will be feed to NN.
I've think change #2 actually more important in than #1, because in current project training NN never sees person with joints too close(i.e. in picture above guy behind the girl will be filtered out, this should confuse network)
In my fork I have code which could read LMDB file - you could look there for proof, another method is to read matlab code GenerateMLDB from original project.
@anatolix Thank you for this awesome finding. I followed your suggestions and trained the model on COCO train2014 dataset. I strictly followed COCO eval protocol and successfully reproduced similar performance on COCO val2014-1k dataset comparing to the original paper!
From my understanding, center-cropping each person removes the background noise, some of the occlusions, and helps the network concentrate more on learning the keypoints for each person. Please correct me if I am wrong.
@anatolix @kevinlin311tw How much iterations do you trained on your side to obtain the comparable results with original paper eventually?
In my fork I changed the iteration size to match original paper, to scale learning rate appropriately, and train exactly same number of iterations. On this fork I was unable to get good results even after 100 iterations, check this issue https://github.com/michalfaber/keras_Realtime_Multi-Person_Pose_Estimation/issues/39
I've found critical bug with generate_hdf5.
the idea is the following: in case of several persons genLMDB puts one picture for each person, and augmentation centers and resize picture on him. Generate_hdf5 just puts one main person. That is the reason of difference in number of pictures, we have 50k, LMDB has 120k pictures.
This is fixed in my fork of project. Training results will be available in 2-3 days.