ufal / neuralmonkey

An open-source tool for sequence learning in NLP built on TensorFlow.
BSD 3-Clause "New" or "Revised" License
410 stars 104 forks source link

Tensorboard does not correctly show steps after continuation of experiment #734

Open kocmitom opened 6 years ago

kocmitom commented 6 years ago

As discussed earlier today: After continuation the tensorboard starts counting from zero. It is due to a variable seen_instances:

https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/learning_utils.py#L523

varisd commented 6 years ago

Seems like the easiest way would be replacing the seen_instances by global_step variable. This would change the interpretation of the logging_period/validation_period attributes.

jlibovicky commented 6 years ago

Global step and seen instances are different quantities on purpose. Logging based on seen instances allows us to compare learning curves from training that has different batch size. On the other hand, optimizer need to use the global steps and not number of processed data items.

jindrahelcl commented 6 years ago

Therefore, we need another variable for logging to TB to have nice continuous curves.

But it think seen_instances is suitable as well. The only thing you need is to be able to setup an initial value for this var in the config (or somehow automatically).

Dne pá 13. 7. 2018 19:48 uživatel Jindřich Libovický < notifications@github.com> napsal:

Global step and seen instances are different quantities on purpose. Logging based on seen instances allows us to compare learning curves from training that has different batch size. On the other hand, optimizer need to use the global steps and not number of processed data items.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ufal/neuralmonkey/issues/734#issuecomment-404905452, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwcsw0myjeLg9Dl-kSfMbQ1ikpDVIEVks5uGN2HgaJpZM4VLLJ1 .

varisd commented 6 years ago

I don't think that comparing learning curves with regards to number of seen instances is relevant. We do not care after how many sentences we got a certain improvement, only after how many updates (batches) we got to a certain point.

There is a huge difference between doing 64 updates with batch_size 1 (many updates, probably in various directions) than one update with batch_size 64 (note that the size of the update might be similar to the one from batch_size 1). That is, the size of each update is usually normalized (we use mean loss) only the direction should be (in theory) better estimated with larger batch.

So it is IMO more reasonable to compare convergence with regards to number of updates than number of seen instances. Alternatively, you can use relative time for another comparison.

varisd commented 6 years ago

Also, as far as model comparison goes: If your not measuring the effect of batch_size on your models, it is a bad idea to have different batch_sizes between runs you want to compare (which basically means that it does not matter whether steps are counted as global steps or number of seen instances).

jindrahelcl commented 6 years ago

It seems you are missing the original point of this issue

Dne pá 13. 7. 2018 20:15 uživatel Dušan Variš notifications@github.com napsal:

Also, as far as model comparison goes: If your not measuring the effect of batch_size on your models, it is a bad idea to have different batch_sizes between runs you want to compare (which basically means that it does not matter whether steps are counted as global steps or number of seen instances).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ufal/neuralmonkey/issues/734#issuecomment-404912510, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwcs0nGw7z4RgDiBP2FhpNC688hYwPJks5uGOPOgaJpZM4VLLJ1 .

varisd commented 6 years ago

The original point was replacing seen_instances by global_step which would pretty much simplify the current implementation.

And that sticking with seen_instances is not justified enough.

jlibovicky commented 6 years ago

Indeed, what I wanted to say was: do not discard seen instances by replacing them by global step.

And regarding the off topic discussion: why do you think seen instances do not matter? It is actually the only measure of training progress that is comparable across training runs. If really the number of seen instances did not matter and the only important think was the number of updates, then you can do batch of 1 or 2 and many updates for lowest cost possible and this is simply not true. Relative time is different on different machines. We can plot both, if you think it is important, but I would prefer to stick with the number of processed instances.

jindrahelcl commented 6 years ago

No. The original issue here is that after you run a continuation of an experiment, the curves in tensorboard start at x=0. Instead, they should start with x=(seen_instances from the previous run). Step is number of batches and does nothing to do with tensorboard's x-axis which is determined by the seen_instances variable.

Dne pá 13. 7. 2018 20:29 uživatel Dušan Variš notifications@github.com napsal:

The original point was replacing seen_instances by global_step which would pretty much simplify the current implementation

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ufal/neuralmonkey/issues/734#issuecomment-404916003, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwcs8QF8k6grV7tj7MjYGBlT38qz3GTks5uGOcIgaJpZM4VLLJ1 .

varisd commented 6 years ago

Number of seen instances is not comparable across runs with different batch_size is not comparable because you cannot tell how much the difference between runs is influenced by the batching itself.

Simply put, try running two experiments with exactly the same parameters (except for the batch_size) and compare their performance at the exact same moment (after seeing N training instances). If they are comparable, the results must be same. They will most likely not. (Same applies for the global step.)

Bottom-line: Experiments with different batch_size parameter are not comparable and it is not a good idea to try to compare them. The batch_size comparison argument for keeping seen_instances is therefore week, however, there are arguments againts it (and for global_step):

  1. Throwing away seen_instances (and using) simplifies the code and logging in general. It also solves the bug (since global_step is a saved variable).
  2. Other TF toolkits use global_step (at least t2t, as far as I know). Which means easier comparison.
jindrahelcl commented 6 years ago

Imagine you are doing a benchmark series of experiments that you train for one epoch. You almost certainly want your learning curves to have the same length, independently of any hyperparameter, including the batch size.

That said, I could imagine a scenario in which you want to have batches on the x axis instead of instances. However, you must agree that if such scenario exists, there is another scenario where you want exactly the opposite behavior.

That also said, in both scenarios, you want your learning curves to continue from the correct x-axis value when you are running a continuation of an experiment. Right now, this means we need to allow tweaking the seen_instances value, and not the global step.

Dne 13. 7. 2018 8:57 odp. napsal uživatel "Dušan Variš" < notifications@github.com>:

Number of seen instances is not comparable across runs with different batch_size is not comparable because you cannot tell how much the difference is influenced by the batching itself.

Simply put, try running two experiments with exactly the same parameters (except for the batch_size) and compare their performance at the exact same moment (after seeing N training instances). If they are comparable, the results must be same. They will most likely not. (Same applies for the global step.)

Bottom-line: Experiments with different batch_size parameter are not comparable and it is not a good idea to try to compare them. The batch_size comparison argument for keeping seen_instances is therefore week, however, there are arguments againts it (and for global_step):

  1. Throwing away seen_instances (and using) simplifies the code and logging in general. It also solves the bug (since global_step is a saved variable).
  2. Other TF toolkits use global_step (at least t2t, as far as I know). Which means easier comparison.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/ufal/neuralmonkey/issues/734#issuecomment-404923023, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwcs4iKdGU-Q9bTfv8BkZM8Z4x_gDcRks5uGO2EgaJpZM4VLLJ1 .

varisd commented 6 years ago

Well, you just said that you can imagine both scenarios (without saying which one is better as long as you stay consistent). In my post, I presented arguments for switching to global_step and banishing seen_instances, so I am getting a little bit lost.

We can keep this issue open a leave it to the offline discussion later.

jindrahelcl commented 6 years ago

This issue should be solved now, by allowing to set initial value for seen instances.

Discussions may happen in the future.

Dne pá 13. 7. 2018 21:23 uživatel Dušan Variš notifications@github.com napsal:

Well, you just said that you can imagine both scenarios (without saying which one is better as long as you stay consistent). In my post, I presented arguments for switching to global_step and banishing seen_instances, so I am getting a little bit lost.

We can keep this issue open a leave it to the offline discussion later.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ufal/neuralmonkey/issues/734#issuecomment-404929603, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwcs7pQCPGBpWJsbZUPNWU_Qcgpfqubks5uGPOYgaJpZM4VLLJ1 .

jlibovicky commented 6 years ago

If you don't remember, at the very beginning we logged in TB using the global step only and we had hard time to at least approximately compare runs with different batch sizes. Model with double batch size just to converge two times as fast, but that was not true at all. To be able to least somehow compare them, we introduced this number of processed instances. I am not claiming it is a very rigorous way of comparison, but it is at least some way of comparison.

jindrahelcl commented 6 years ago

The best way would be to provide both of these values to tensorboard and then switch between views, just as you swicth between step, wall time, and relative time.

Dne pá 13. 7. 2018 21:41 uživatel Jindřich Libovický < notifications@github.com> napsal:

If you don't remember, at the very beginning we logged in TB using the global step only and we had hard time to at least approximately compare runs with different batch sizes. Model with double batch size just to converge two times as fast, but that was not true at all. To be able to least somehow compare them, we introduced this number of processed instances. I am not claiming it is a very rigorous way of comparison, but it is at least some way of comparison.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ufal/neuralmonkey/issues/734#issuecomment-404933909, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwcs2lER0CmTKRAfIlHurpQswP-1XoNks5uGPfmgaJpZM4VLLJ1 .

kocmitom commented 6 years ago

I would also say, that having both metrics (seen examples as well as steps) would be the best option. But either way, after starting the continuation of the experiment, the curve should stay at correct step/seen_examples not starting from 0, otherwise, it is a mess in tensorboard.