stickeritis / sticker

Succeeded by SyntaxDot: https://github.com/tensordot/syntaxdot
Other
25 stars 2 forks source link

Periodical eval and summaries on tensorboard #143

Closed sebpuetz closed 4 years ago

sebpuetz commented 4 years ago

For pretraining it's nice to do periodical evaluation and save on improvements. It's even nicer to plot this periodical evaluation in tensorboard since it nicely illustrates how the accuracy fluctuates between pretrain-batches.

For my (non-sticker) experiments, dev-accuracy was still moving up and down by ~0.1% after ~300k train steps, although the model had reached the final level (0.02% below the highscore) after less than 200k train steps.

danieldk commented 4 years ago

These are actually two issues (periodical evaluation and making these available in Tensorboard). I have now finished the first issue more or less (currently doing a test run). Will do a PR after #158 is approved.

I have changed pretraining quite a bit and removed the notion of epochs (which is annoying for various reasons). Instead you specify:

(1) The total number of steps (for determining when to stop and how the learning rate should decrease). (2) The number of evaluation steps.

During pretraining, after the number of steps in (2) is reached, the model is evaluated on the validation data and the result is reported. If the current model is an improvement over the last evaluation on the validation data, it is saved.

danieldk commented 4 years ago

Sample screenshot. Screenshot from 2019-10-31 20-05-15

twuebi commented 4 years ago

Maybe we should keep an option n-times-trainset. Otherwise, we'll have to calculate the number of steps needed for one 'epoch' for every batch size.

danieldk commented 4 years ago

I agree that that would be nice. But I still have to think about how to do that elegantly. Before this change, the progress (for the progress bar and learning rate) was based on the position in the file. Now they are based on global_step as a portion of the overall number of steps.

I would like to avoid having both step-based and file position-based progress. But we do not really know how many steps there are in an epoch before counting first. Perhaps I'll try to see how expensive it is to count the number of steps in the pretraining data. Should at least be a little bit faster than a normal pass over the data, since we do not have to vectorize the instances.

danieldk commented 4 years ago

Meh, 9 minutes for Lassy Large on turing.

Edit: we can do a lot better here, a lot of time is spent in memory allocation and splitting (since we are building up a full graph for every sentence). So I'll make a dumb parser that just counts the number of sentences.

danieldk commented 4 years ago

Just north of 1 minute now, which seems acceptable. For some reason pretraining takes ~7 days now on turing-sfb. I guess something is broken.