ml5js / training-charRNN

Training charRNN model for ml5js
Other
96 stars 46 forks source link

README clarification on hyperparameters #1

Closed carlcorder closed 6 years ago

carlcorder commented 6 years ago

The suggested hyperparameters have a dropout of 0.25. However, it's unclear which option to use with train.py in order to achieve this?

...
parser.add_argument('--output_keep_prob', type=float, default=1.0,
  help='probability of keeping weights in the hidden layer')
parser.add_argument('--input_keep_prob', type=float, default=1.0,
  help='probability of keeping weights in the input layer')

My guess would be either output_keep_prob or input_keep_prob but I'm not sure.

Thanks!

cvalenzuela commented 6 years ago

Both! You can follow the original repo recommendations for this here:

Tuning your models is kind of a "dark art" at this point. In general:

  1. Start with as much clean input.txt as possible e.g. 50MiB
  2. Start by establishing a baseline using the default settings.
  3. Use tensorboard to compare all of your runs visually to aid in experimenting.
  4. Tweak --rnn_size up somewhat from 128 if you have a lot of input data.
  5. Tweak --num_layers from 2 to 3 but no higher unless you have experience.
  6. Tweak --seq_length up from 50 based on the length of a valid input string (e.g. names are <= 12 characters, sentences may be up to 64 characters, etc). An lstm cell will "remember" for durations longer than this sequence, but the effect falls off for longer character distances.
  7. Finally once you've done all that, only then would I suggest adding some dropout. Start with --output_keep_prob 0.8 and maybe end up with both --input_keep_prob 0.8 --output_keep_prob 0.5 only after exhausting all the above values.
carlcorder commented 6 years ago

Thank you for your very thoughtful and organized answer! I'll keep this comment as reference while training models in the future.