tryolabs / luminoth

Deep Learning toolkit for Computer Vision.
https://tryolabs.com
BSD 3-Clause "New" or "Revised" License
2.4k stars 400 forks source link

[Train] Use valid and train data to train model #226

Closed JuanSeBestia closed 5 years ago

JuanSeBestia commented 5 years ago

Why only use a data bank to train the model, it has always been better to use a data bank for training and another for validation to avoid the over-training of the model, why not use it here?

dekked commented 5 years ago

Hello @jsdussanc!

When you use a dataset, you normally have several 'splits': train, val and test.

Using them appropriately is the responsibility of the user. For example, you would use val for hyperparameter tuning. The lumi dataset tool can convert several splits to tfrecords for using with Luminoth.

Please let me know if my answer suffices or if I did not understand your question!

JuanSeBestia commented 5 years ago

No, you have not understood me

It is true that many of the Luminoth tools support "split", but the training tool does not include this and you have to choose only one dataset for it.

See line 74 in luminoth/models/ssd/base_config.yml

dataset:
  type: tfrecord
  # From which directory to read the dataset
  dir: datasets/voc/tf
  # Which split of tfrecords to look for
  split: train

Only support one split to train

In my feew knowledge, the "train" dataset is used to perform the backpropagation and the "valid" dataset to perform the tuning hyperparameter tuning or set the fitness for epoch.

Please correct me if I'm wrong about something or if I'm not very clear

dekked commented 5 years ago

I am not sure I am following...

Are you mentioning that train should also support a val split for early stopping during training (with the train split)? Or are you talking about tuning parameters?

JuanSeBestia commented 5 years ago

Why not talk about both?

How about the model is over-trained with features that are not representative that may be detectable if you add val.

The early stop may also be a good idea, but I had not thought about it at this time.

I reiterate I am not an expert, I only relate what I have seen in other implementations.

dekked commented 5 years ago

These are two different issues altogether.

Issue 1. Using val split during training and computing metrics (mAP, loss, etc) for early stopping. This cannot be done so far.

Issue 2. Training with train+val splits together, instead of just train. This can be done by creating a new split using the lumi dataset merge command.

JuanSeBestia commented 5 years ago
  1. it's OK

  2. I do not know if I understand the point, use more examples to avoid over-training?

What I currently do is create a dataset from PASCAL using the train_val set, that includes the two dataset train + val of my own data, without using the marge command.

Using the marge command as you suggest, would it bring me different results?

My point is that I am using the val dataset to update the weights of the model, which is what in many places I have seen that it is not indicated.

I use it properly is to determine that the updating of the weights is directed by the best way (fitness).

joaqo commented 5 years ago
  1. You should use your training dataset when running the train command.

  2. You should use your validation dataset when running the eval command. After this you usually check the results of the eval command and make decisions based on them.

Your training and validation data should not be merged during training, or else this whole separation of datasets to reduce overfitting is pointless.

Luminoth supports both these features.

JuanSeBestia commented 5 years ago

Exactly, it's pointless.

At the moment I have not done the exercise of doing both at the same time (eval and train).

Everything I have stated above is a theoretical element.

I currently have a model with mAP: 0.925, trained with val+train with 422671 steps. I do not currently have a test dataset to contrast my model (see dataset in https://github.com/jsdussanc/dipstick), Any configuration suggestion is welcome.

Suppose

I have 3 datasets (train, val, test), in the training I use train together with the train tool, I use 3 different processes with the eval tool (only recive one split), I have the traceability of the contrast of the 3 datasets

InK steps, I observed that the model began to be over-trained in K-1000 steps.

What can I do?

I can't rollback at the moment

Don't get me wrong, I love what you have done here so far

img_20181031_140605

dekked commented 5 years ago

There was an undocumented option for keeping more checkpoints around.

See commit https://github.com/tryolabs/luminoth/commit/73534c04a3cfe85808085912a3bb833ac7d4d63a, hope it's useful for your case :)

JuanSeBestia commented 5 years ago

Thank you very much, in the future, maybe this change can save many people from recovering from a fall.

Now Do you want me to create an issue talking about the possibility of creating an early-stop training functionality as you suggested?

dekked commented 5 years ago

It's really not necessary, it's something we have in the roadmap, but not trivial to implement :)

JuanSeBestia commented 5 years ago

If I thought that, more when you support several forms of optimization.

Is there any way to see your roadmap? or it's secret?