qqwweee / keras-yolo3

A Keras implementation of YOLOv3 (Tensorflow backend)
MIT License
7.14k stars 3.44k forks source link

Training with unfreezed layers doesn't start #705

Closed JonaGanz closed 4 years ago

JonaGanz commented 4 years ago

I'm facing the following problem. When I train with train.py on my Dataset, training with frozen layers work fine but the training with unfreezed layers don't start. I don't get an Error and my GPU usage doesn't seem to raise after I start training with unfrozen layers. I read issues #652 and #122 , which seem to be quite common to my problem.

In creat_model I set load_pretrained = False, I also set the initial epochs of the second training step to the number of training steps from the first stage. In #122 it is suggested to update Tensorflow to 1.18, which I haven't done yet. I'm using Tensorlfow 1.15 and Keras 2.2.4.

Thank you very much for your help.

NorbertDorbert commented 4 years ago

Hi joluga24,

I have the same problem. Did you find a solution?

JonaGanz commented 4 years ago

Hi Norbert, sadly not. In some posts it was advised to upgrade Tensorflow. So I decided that if I have to change the version of tensorflow, I can use an implementation of Yolo in tensorflow 2 as well. Von meinem Samsung Galaxy Smartphone gesendet. -------- Ursprüngliche Nachricht --------Von: NorbertDorbert notifications@github.com Datum: 20.07.20 17:07 (GMT+01:00) An: qqwweee/keras-yolo3 keras-yolo3@noreply.github.com Cc: joluga24 joluga@web.de, Author author@noreply.github.com Betreff: Re: [qqwweee/keras-yolo3] Training with unfreezed layers doesn't   start (#705) Hi joluga24, I have the same problem. Did you find a solution?

—You are receiving this because you authored the thread.Reply to this email directly, view it on GitHub, or unsubscribe.

NorbertDorbert commented 4 years ago

Hi joluga24,

I think I got it. In my case it was a simple mistake: I chose first_stage_initial_epoch = 0 first_stage_epochs = 100 second_stage_initial_epoch = 100 second_stage_epochs = 100

But it has to be first_stage_initial_epoch = 0 first_stage_epochs = 100 second_stage_initial_epoch = 100 second_stage_epochs = first_stage_epochs + 100

The epochs of the second stage is the max epoch of both together. Otherwise second stage starts at initial_epoch 100 and immediatly stops since epoch 100 is already reached.

Tell me, if that fixed it for you as well :)

cad-ml commented 4 years ago

Hi NorbertDorbert, I had the same problem, I followed your recommendations and it's solved thank you very much!

JonaGanz commented 4 years ago

Hi joluga24,

I think I got it. In my case it was a simple mistake: I chose first_stage_initial_epoch = 0 first_stage_epochs = 100 second_stage_initial_epoch = 100 second_stage_epochs = 100

But it has to be first_stage_initial_epoch = 0 first_stage_epochs = 100 second_stage_initial_epoch = 100 second_stage_epochs = first_stage_epochs + 100

The epochs of the second stage is the max epoch of both together. Otherwise second stage starts at initial_epoch 100 and immediatly stops since epoch 100 is already reached.

Tell me, if that fixed it for you as well :)

Thumbs up, works for me as well! Thank you very much for yor help!

cad-ml commented 4 years ago

Hello, I used YOLOV3 for object recognition. I chose the "Cat_Face" database and tried to learn on a single image. The program ran well and I manage to generate the .h When I wanted to test my model on the same image, the system didn't recognize the cat's face.

I did the same operation again, choosing the same database of cat faces (about 100 images), the system didn't recognize the face of the cat I tested individually. On the other hand, it did recognize other faces of certain cats.

What did I do wrong?

Thank you very much for your help or suggestions or any ideas

cad-ml commented 4 years ago

any idea?

NorbertDorbert commented 4 years ago

Hi cad-ml,

In theory if you train your network long enough on a dataset, it should recognize everything on it perfectly. But, as you probably know, it's usually not what you want. You train on a dataset, usually the bigger the better, and validate your network every epoch on a validation set. You then choose the weights, which worked best on the validation set (save best weights only in the checkpoints) and try it on another test set. The goal is that your network handles unkown data best, not the data it was trained on.

In your case, I would look at the learning curve. Maybe you just didn't train long enough. But anyway you should use a bigger data set and try to achive good results on a test set, which the network doesnt know yet

Sorry, if that is not solving your problem :/

cad-ml commented 4 years ago

Thank you for your response and the recommendations that you were able to provide. I will try to train my model longer and maybe play a little bit on the hyper-parameters to maybe perform the model. I will come back to you if this laborious work gives accurate and best results