thtrieu / darkflow

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices
GNU General Public License v3.0
6.13k stars 2.08k forks source link

Yolo loss at a checkpoint. if i stop and restart training from latest checkpoint, Loss increases multifold again why #784

Open nidhimittalhada opened 6 years ago

nidhimittalhada commented 6 years ago

Hi all,

Fundamental question regarding yolo loss. Kindly help in clarifying.

I trained my model on few images. My model's loss reached upto 0.04. I stopped training. Now, i started training again NOT fresh, BUT from last checkpoint, on same training images. But to my surprise, loss now starts from 2000.

My question is, when loss for same training images reached upto 0.04, if i am using same checkpoint, and same training image dataset, why will loss increase to 2000 if i restart training?

Does it hint some problem in my weights store and load process?

Does it hint that I am losing some training learning, when i save my learned weights in a ckpt?

tianyu-tristan commented 6 years ago

I don't have full answer, but I would research for the following:

(1) 0.04 looks like overfitting. If you predict on training dataset do you still get classification issue?

(2) to resume training, except model weights there's also optimizer weights which may be random initialized when you resume training

(3) for your white brown box classification, I'm no sure yolo is the best idea. If yolo, I would try to make image preprocess in HSV color space than RGB (default darkflow)

On Thu, May 31, 2018 at 5:18 AM nidhimittalhada notifications@github.com wrote:

Hi all,

Fundamental question regarding yolo loss. Kindly help in clarifying.

I trained my model on few images. My model's loss reached upto 0.04. I stopped training. Now, i started training again NOT fresh, BUT from last checkpoint, on same training images. But to my surprise, loss now starts from 2000.

My question is, when loss for same training images reached upto 0.04, if i am using same checkpoint, and same training image dataset, why will loss increase to 2000 if i restart training?

Does it hint some problem in my weights store and load process?

Does it hint that I am losing some training learning, when i save my learned weights in a ckpt?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/thtrieu/darkflow/issues/784, or mute the thread https://github.com/notifications/unsubscribe-auth/AVtAHaRtiQndPXMdyTGggu-arAQnuy2Pks5t39-ogaJpZM4UU8at .