training loss curve and weights for AEloss

princeton-vl / pose-ae-train

Training code for "Associative Embedding: End-to-End Learning for Joint Detection and Grouping"

BSD 3-Clause "New" or "Revised" License

373 stars 76 forks source link

training loss curve and weights for AEloss #36

Open coordxyz opened 5 years ago

coordxyz commented 5 years ago

hi @anewell , Thank you for releasing the training code. I tried to train the network on coco2014 with original settings except {'batchsize': 8, 'input_res': 256, 'output_res': 64}. The training loss curve looks so weird but the output map look right. I'm quite confused. Does this normal? Could you please share your training log? trainloss

In the line 102 of my_lib.c for push_loss calculation, why do you need to multiply the push_loss with 0.5?

if(current_people>1)
output_tmp[0]/= current_people*(current_people-1)/2;
output_tmp[0] *= 0.5;

Why are the weights for pull_loss and push_loss so small(1e-3)? How did you choose the weights for pull_loss, push_loss and detection_loss？

jiachen0212 commented 5 years ago

hello @bhyzhao , need you help.... when i cd /extensions/AE and run python build.py install i meet the bug: cffi.VerificationError: CompileError: command 'gcc' failed with exit status 1

i'm using py3.6 and pytorch 0.4 gcc is 4.85 (also i tried 4.91) . so i wonder which type of gcc is suitable?

abcxs commented 5 years ago

I want to know why use so small weights(1e-3), it is best ?

coordxyz commented 5 years ago

hello @bhyzhao , need you help.... when i cd /extensions/AE and run python build.py install i meet the bug: cffi.VerificationError: CompileError: command 'gcc' failed with exit status 1

i'm using py3.6 and pytorch 0.4 gcc is 4.85 (also i tried 4.91) . so i wonder which type of gcc is suitable?

hi, there, I'm using python 3.6.5 pytorch 0.4.0 gcc version 5.4.0

coordxyz commented 5 years ago

I want to know why use so small weights(1e-3), it is best ?

I tried to adjust it to higher value but failed, e.g. 1e-2. It seems 1e-3 just work well for this problem but I don't know why. And the training is really frustrated slow.