fine tuning issue - Githubissues

LinlyAC commented 7 years ago

Hello everyone, these day I meet a question which I can't solve, so come here for help. My work is to finetuning this caffe-heatmap model to adapt my own dataset ,which also have seven upper-body joints. But when I deal with fintuning, I draw two chart, Train loss vs iter and Test Accuracy vs iter. The first chart seems to be normal(that is monotonic decrease), but the accuracy rapidly decrease to zero after few iters.

That confuses me. In my opinion, the accurary should go up when the train loss go down. So is everyone can help me? (PS: my work have same needs with the caffe-heatmap, that need 7 upper-body joints,so I don't change the original code, only change the train and test data) Thanks.

bazilas commented 7 years ago

Check out the heatmap predictions. If everything converges to zero, you might need to weight your foreground/background gradients so that they equal contribution to the parameter update.

LinlyAC commented 7 years ago

@bazilas Thanks for your answer. I have tried your suggestion. About the heatmap predictions, I check the log file. In this file, the loss_heatmap also go down from 1.00483 to 0 [iter 1] [iter 163] Maybe I mistake your suggestion, I also use this 'disabled' model to run the matlab demo, and each heatmap turn to blue. In addition, all joints coordinates in each frame seem to get together.

I do not know whether this is what you mean. If this is the mistake, how should I to adjust the foreground/background gradients. Is there some skills?

Thank you again!

bazilas commented 7 years ago

you could count the number of foreground / background heatmap pixels (e.g. by thresholding) and balance the gradients accordingly.

skyz8421 commented 7 years ago

I met the same problem. I used the prototxt that the author provided, and my training loss just goes down to 0 very quickly, but the prediction is very bad, and the heatmaps turn out to be blue. How to do with the problem? How to do some modifications based on the balancing skill? @bazilas

power0341 commented 7 years ago

Hey guys @bazilas @LinlyAC @samlong-yang, I'm also stuck here, I modified some lines of the the source code in order to train a model predicting several joints location via single depth images. My model just behaves the same way as @LinlyAC and @samlong-yang 's, in training, the loss fell down to near zero from the first iteration, and it only predicts single valued heatmaps. In fact, I tried the Matlab version and resulted the same. It would be very great if someone can offer a tutorial on how to effectively train a model using heatmap regression.

LinlyAC commented 7 years ago

I am grateful that many people pay attention to this problem. @bazilas @samlong-yang @power0341 I really haven't solved the problem yet, but I have a new idea that might help. When I try to fine-tuning this model, the data format like that: because this format is also used in readme.txt: However, when I download the example FLIC dataset, the data format like this: It can be clearly seen that the data format is different from the Readme.txt. To be honest, I don't quite understand the meaning of these decimals, and I don't know if anyone can give me some help, which may help us solve the problem of fine-tuning.

distant1219 commented 7 years ago

Hey， @LinlyAC ，I'm training this project and I use the data which the author provided.But I met same problem.Did you try to train the author's dataset?Hope you reply.

power0341 commented 7 years ago

hi @LinlyAC , if I understand, there are a couple of things that matter. first, we need to normalize the coordinate of joints, for example, (x/w, t/h) or ((c_x-x)/w, (c_y-y)/w), referring to "DeepPose" paper for details, then, we also pay attention to carefully choosing the magnitude of the gaussian so the model really gets converaged.

distant1219 commented 7 years ago

Hello @power0341 @LinlyAC ,in the readme of the project, we should set the 'multfact' is 282 if using preprocessed data from website.That the parameter multiplies joints coordinate which is decimal is the ground truth.If we use our datasets, I think the 'multfact' should be set to 1.Even though, I also can't train a proper model.It's loss becomes very lower at the begin, but it's wrong.What should i do?Want Help!!

LinlyAC commented 7 years ago

@distant1219 I am really grateful for your help. But I also cannot solve this problem, I am so sorry.

Hi, @tpfister, I am sorry to bother you. But there are some people like me who encountered some problems can not be solved, so we hope you can give us some tips about finetuning this model.

Heartfelt thanks to you.

kennyjchen commented 7 years ago

I am having the same problem as LinlyAC, and I will help with the search for a solution. Will post back if I figure it out.

EEWenbinWu commented 7 years ago

The reason why your heatmaps is blue is because the demo's multifacts = 1. See This. picture_meitu_1

kennyjchen commented 7 years ago

Hi everyone,

I managed to solve the problem with the blue heatmap. It seems as if Stochastic Gradient Descent (SGD) was exploding in loss for batchsizes greater than 3, or dropping drastically to zero for batchsizes less than 3. Try changing the type in solver.prototext to AdaGrad or AdaDelta, as seen here:

http://caffe.berkeleyvision.org/tutorial/solver.html

I'm not sure what version of Caffe they implemented this; I had to recompile caffe-heatmap using the newest version of Caffe. After doing this and training on batchsizes of ~25, I am now able to fine tune.

A note on EEWebbinWu's comment: if you follow my steps above, do not put the scalar multiplier as shown; it will not work.

tpfister / caffe-heatmap

fine tuning issue #17