kevinlin311tw / keras-openpose-reproduce

Keras implementation of Realtime Multi-Person Pose Estimation
110 stars 43 forks source link

about training data #14

Closed H-Liu1997 closed 2 years ago

H-Liu1997 commented 5 years ago

Hi Kevin, I found in your code the training samples is 110K and val data is about 2K. the training data in offical train2014 is 80K and val2014 is 40k, it seems you using a mix dataset which contain at least 95% of val2014 data. for the accuracy you report, both the mini- val2014 and val 2014 data are at least 95% in your training data, so I don't think the result is so meaningful, have you ever test the result in test-dev2014 or test-dev2017 ? I have read another keras reproduce code in your readme, which you report the model accuracy is 56.7%, they only use the train2014 data. for some other reproduce code, there are also someone using 110K data to train and 2K data for val, Is that because the official openpose using the data set like this? I have read the openpose official code but didn't see that, of course there maybe some misunderstanding of some part. but totally, I still confuse that there didn't have some overfitting about this result? sorry to bother you and looking forward to your reply

kevinlin311tw commented 5 years ago

Thanks for your interest in our repo. I believe there are several misunderstanding.

First of all, people use the COCO2014 Keypoint Dataset instead of the entire COCO2014 dataset for training human pose estimation. Although there are 80K training images in COCO2014 dataset, only some of the images have keypoint annotations.

For COCO2014 Keypoint, there should be only ~45K images that have keypoint annotations, and these annotations are split to train/val sets.

On the other hand, please also note that each image may have more than one person. Given an image that has more than one person, for each person we apply geometric transformation to translate the image making each person located at the center, and thus generate multiple training examples. Thus there is a total of 110K training samples. This is a strategy making the learning process easier. It is common used by many researchers including OpenPose ;)

We also tested the model on COCO Challenging test set, but we think it is difficult to get more insights from the results. This is mainly because a lot of hacking takes place in the challenges. People tend to hack the leader-board using many different approaches such as model ensemble, and the results may not really represent a paper outcome. On the other hand, CMU team uses OpenPose+Person Detection+CPM (a single-person pose estimation for refinement). Though effective, I think reproducing such model combination is already out of the scope of this repo.

kevinlin311tw commented 5 years ago

Please also take a special look at our data preprocessing code here

We did not mix the training set and validation set at all.

H-Liu1997 commented 5 years ago

Thank you for your reply and sorry for my misunderstanding.

I read the data preprocessing code of your repo and another keras repo which attached in your readme, the difference of train data num was decided by repetition one image or not.

For their repo, the 80K annotations will get only 39K meta data for training, which is 110K in your repo. Thank you for sharing! now I understand two different ways to generate training data.

And it's easy to have a question: the training data are 3 times larger for the second method, which means need more times for one epoch. From this point, it seems using 39K data to train is a better choice.

However, I think there must have some mAP difference, and using 110K data for training the mAP will be higher. So is this the reason that lead to mAP difference (56.7% for their repo and 58.3% for your repo)?

Thank you for your reply again!!!

H-Liu1997 commented 5 years ago

Sorry for some mistakes, it seems the result of using 39k imgs and using 110k imgs are almost same, you attached the first one is 58.9 and second one is 59.0. so it seems doesn't matter choose which one of this two dataloader methods

kevinlin311tw commented 5 years ago

It should be similar. Ours one-epoch training is just like their 3-epoch training.

H-Liu1997 commented 5 years ago

thanks for sharing!all best for your each projects!

Kevin Ke-Yun Lin notifications@github.com于2019年9月25日 周三14:57写道:

It should be similar. Ours one-epoch training is just like their 3-epoch training.

— You are receiving this because you authored the thread.

Reply to this email directly, view it on GitHub https://github.com/kevinlin311tw/keras-openpose-reproduce/issues/14?email_source=notifications&email_token=AJ7XDH4TXHI7KMMYYYVRKYLQLL4URA5CNFSM4IZ73TKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7QWSAI#issuecomment-534866177, or mute the thread https://github.com/notifications/unsubscribe-auth/AJ7XDH27T2LN44VYJH3J6WLQLL4URANCNFSM4IZ73TKA .

H-Liu1997 commented 5 years ago

Hi Kevin, sorry to bother you again, maybe there still use both train and val data for training.

your data prepare code in line 206tr_total_write_count = len(data) - val_total_write_count

the len of data is len of joint_all, which equal to the images number generated by train and val data together.

I have read several other openpose reproduce repos and openpose offical code their data prepare code in lines 42-45 if(validation == 1): totalWriteCount = isValidationArray.count(0.0); else: totalWriteCount = len(data)

the difference between your repo and openpose is using 120K data to train or 117K data to train,which means contains 2645 val_data or not.

Also, there maybe still some misunderstanding, however, different with the first time, I think maybe It's reasonable for this training method. the training data is not absolutely same with original data, because of geometric transformation depend on the selected person etc, but I still worry about the performance in test-dev data using the model training in this way.

In fact, I also tried training the model in this way, and the results is 59.9 in mini_val_2014, 58.7 in val2017(5Kdata) only 53.3 in test-dev 2017.

I tried michal's repo it doesn't using any val data for training, which contain 55K data for train2017 and 36K data for train2014, however, he doesn't have the final map result. I rewrite his dataloader to adapt my pytorch version code. the result can only get 53.7 in val2017(5K).

using more data to train seems lead to high accuracy, but I confuse the high accuracy is "real" or not. I'm looking forward for your commits or opinions of this problem, thanks!!!

kevinlin311tw commented 5 years ago

We split train and val data to different HDF5 files by this

Again, we do not mix the training set and validation set at all.

H-Liu1997 commented 5 years ago

Hi Kevin, I already read the code you attached.

I would say now we agree there are 117K data without isval mark, and 2.6K data have isval mark. The code you attached just separate them by isval mark. however, for the 117K data which doesn't have isval mark, it still contains train2014 data and val2014 data.

The reason is, in process function, it just judge each image in val2014 dataset is first 2.6K images or not, but don't have continue function in else condition. code here.

if one images is No.3K in val2014, after process() , it still a part of joint_all, and it isval mark is 0. so, the total 117K data maybe the mix result of train2014 and val2014. and there are some online available JSON files, which be generated after function like process(), I have checked it and it exist val dataset image with isval mark = 0.

kevinlin311tw commented 5 years ago

This is a valid concern...

Maybe it is o.k. to evaluate the performance on test-dev because they do not use test-dev for training. However, it is not the standard train/val split. You may want to check with OpenPose's authors for further clarification.

H-Liu1997 commented 5 years ago

Anyway, thanks for your reply.

Your answer help me a lot in understanding the dataloader portion :)

Again, all best for your each projects :)