Make the exponential decay lr schedule available

twuebi commented 5 years ago

Right now, the exponential decay lr schedule is not available for sticker trainand sticker pretrain. Once #145 is merged, it would make sense to have it available for both subcommands.

This may clutter the command line arguments a bit since we then have:

Plateau decay
- lr_scale
- lr_patience
Exponential decay
- decay_rate
- decay_exponent

Maybe it would make sense to move the learning-rate schedule related things to the config file.

danieldk commented 5 years ago

No more things in the configuration file, there is too much stuff in there already that is only relevant to training. Maybe clap offers some functionality to only unveil options based on the value of some other option?

danieldk commented 5 years ago

There is requires, not sure if it hides an option if it is not present:

https://kbknapp.github.io/clap-rs/clap/struct.Arg.html#method.requires

twuebi commented 5 years ago

maybe group in conjunction with requires

danieldk commented 5 years ago

Yep. It's worth trying if they get hidden if the requirement is not given. But I guess at the very least it would also group the arguments together in usage information? (Which would go a long way of not making it to confusing.)

twuebi commented 5 years ago

https://kbknapp.github.io/clap-rs/clap/struct.ArgGroup.html

You can also do things such as name an ArgGroup as a confliction or requirement, meaning any of the arguments that belong to that group will cause a failure if present, or must present respectively. Perhaps the most common use of ArgGroups is to require one and only one argument to be present out of a given set. Imagine that you had multiple arguments, and you want one of them to be required, but making all of them required isn't feasible because perhaps they conflict with each other. For example, lets say that you were building an application where one could set a given version number by supplying a string with an option argument, i.e. --set-ver v1.2.3, you also wanted to support automatically using a previous version number and simply incrementing one of the three numbers. So you create three flags --major, --minor, and --patch. All of these arguments shouldn't be used at one time but you want to specify that at least one of them is used. For this, you can create a group.

https://kbknapp.github.io/clap-rs/clap/struct.App.html#method.arg_group

twuebi commented 5 years ago

I looked a bit further into it, not yet happy with it.

Examples are below.

For the grouping in the help to work, we need to set AppSettings::DeriveDisplayOrder and hide_default_value(true) on every argument that has a default value and should be followed by a newline. This is necessary since appending "\n " to the preceeding help message was the only way to introduce a blank line to get a visual grouping (https://github.com/clap-rs/clap/issues/1250). Arg::conflicts_with also conflicts with Arg::default_value, Arg::default_value_if can be used to get a conditional default value.

Grouping:

sticker-train 0.10.0
Train a sticker model

USAGE:
    sticker train [OPTIONS] <CONFIG> <TRAIN_DATA> <VALIDATION_DATA>

OPTIONS:
        --batchsize <BATCH_SIZE>    Batch size [default: 256]
        --continue <PARAMS>         Continue training from parameter files (e.g.: epoch-50)
        --lr <LR>                   Initial learning rate [default: 0.01]
        --warmup <N>                For the first N timesteps, the learning rate is linearly scaled up to LR.

        --plateau                   Plateau learning rate schedule
        --lr-patience <N>           Scale learning rate after N epochs without improvement
        --lr-scale <SCALE>          Value to scale the learning rate by

        --exponential               Exponential learning rate schedule
        --decay-rate <RATE>         coefficient of the exponential decay
        --decay-steps <STEPS>       global_step / steps is the exponent of the decay_rate

        --maxlen <N>                Ignore sentences longer than N tokens
        --shuffle_buffer <N>        Size of the buffer used for shuffling.
        --patience <N>              Maximum number of epochs without improvement [default: 15]
        --logdir <LOGDIR>           Write Tensorboard summaries to this directory.
    -h, --help                      Prints help information
    -V, --version                   Prints version information

ARGS:
    <CONFIG>             Sticker configuration
    <TRAIN_DATA>         Training data
    <VALIDATION_DATA>    Validation data

No grouping:

sticker-train 0.10.0
Train a sticker model

USAGE:
    sticker train <CONFIG> <TRAIN_DATA> <VALIDATION_DATA> <--plateau|--exponential>

OPTIONS:
        --batchsize <BATCH_SIZE>    Batch size [default: 256]
        --continue <PARAMS>         Continue training from parameter files (e.g.: epoch-50)
        --lr <LR>                   Initial learning rate [default: 0.01]
        --warmup <N>                For the first N timesteps, the learning rate is linearly scaled up to LR. [default:
                                    0]
        --plateau                   Plateau learning rate schedule
        --lr-patience <N>           Scale learning rate after N epochs without improvement
        --lr-scale <SCALE>          Value to scale the learning rate by
        --exponential               Exponential learning rate schedule
        --decay-rate <RATE>         coefficient of the exponential decay
        --decay-steps <STEPS>       global_step / steps is the exponent of the decay_rate
        --maxlen <N>                Ignore sentences longer than N tokens
        --shuffle_buffer <N>        Size of the buffer used for shuffling.
        --patience <N>              Maximum number of epochs without improvement [default: 15]
        --logdir <LOGDIR>           Write Tensorboard summaries to this directory.
    -h, --help                      Prints help information
    -V, --version                   Prints version information

ARGS:
    <CONFIG>             Sticker configuration
    <TRAIN_DATA>         Training data
    <VALIDATION_DATA>    Validation data

Making the args mutually exclusive works via conflicts_with which can be specified on ArgGroup as well as Arg. Setting ArgGroup::multiple to true allows multiple values from the same group, per default it is false which means only one value from a group can be present.

.group(ArgGroup::with_name(SCHEDULE_GROUP).required(true))
            .arg(
                Arg::with_name(PLATEAU)
                    .long("plateau")
                    .help("Plateau learning rate schedule")
                    .group(SCHEDULE_GROUP)
                    .requires(PLATEAU_GROUP),
            )
            .group(
                ArgGroup::with_name(PLATEAU_GROUP)
                    .multiple(true)
                    .conflicts_with_all(&[EXPONENTIAL, EXPONENTIAL_GROUP])
            )
            .arg(
                Arg::with_name(LR_PATIENCE)
                    .long("lr-patience")
                    .value_name("N")
                    .help("Scale learning rate after N epochs without improvement")
                    .group(PLATEAU_GROUP)
                    .default_value_if(PLATEAU, None, "5"),
            )
            .arg(
                Arg::with_name(LR_SCALE)
                    .long("lr-scale")
                    .value_name("SCALE")
                    .help("Value to scale the learning rate by")
                    .group(PLATEAU_GROUP)
                    .default_value_if(PLATEAU, None, "0.5"),
            )
            .arg(
                Arg::with_name(EXPONENTIAL)
                    .long("exponential")
                    .help("Exponential learning rate schedule")
                    .group(SCHEDULE_GROUP)
                    .requires(EXPONENTIAL_GROUP),
            )
            .group(
                ArgGroup::with_name(EXPONENTIAL_GROUP)
                    .multiple(true)
                    .conflicts_with_all(&[PLATEAU, PLATEAU_GROUP])
            )
            .arg(
                Arg::with_name(DECAY_RATE)
                    .long("decay-rate")
                    .value_name("RATE")
                    .help("coefficient of the exponential decay")
                    .group(EXPONENTIAL_GROUP)
                    .default_value_if(EXPONENTIAL, None, "0.998"),
            )
            .arg(
                Arg::with_name(DECAY_STEPS)
                    .long("decay-steps")
                    .value_name("STEPS")
                    .help("global_step / steps is the exponent of the decay_rate")
                    .group(EXPONENTIAL_GROUP)
                    .default_value_if(EXPONENTIAL, None, "100"),
            )

The error messages we're getting are sometimes helpful:

$ ./target/release/sticker train dep.conf ger/train.conll ger/dev.conll 
error: The following required arguments were not provided:
    <--plateau|--exponential>

USAGE:
    sticker train <CONFIG> <TRAIN_DATA> <VALIDATION_DATA> --batchsize <BATCH_SIZE> --lr <LR> --patience <N> --warmup <N> <--plateau|--exponential>

For more information try --help

Sometimes not so much:

$ ./target/release/sticker train dep.conf train.conll dev.conll --exponential --lr-patience 5 --lr-scale 0.3
error: The argument '--exponential' cannot be used with one or more of the other specified arguments

USAGE:
    sticker train <CONFIG> <TRAIN_DATA> <VALIDATION_DATA> --batchsize <BATCH_SIZE> --lr <LR> --patience <N> --warmup <N> <--decay-rate <RATE>|--decay-steps <STEPS>> <--lr-patience <N>|--lr-scale <SCALE>> <--plateau|--exponential>

For more information try --help

./target/release/sticker train dep.conf ger/train.conll ger/dev.conll --exponential --decay-rate 5 --lr-scale 0.3
error: The argument '--lr-scale <SCALE>' cannot be used with one or more of the other specified arguments

USAGE:
    sticker train <CONFIG> <TRAIN_DATA> <VALIDATION_DATA> --batchsize <BATCH_SIZE> --lr <LR> --patience <N> --warmup <N> <--decay-rate <RATE>|--decay-steps <STEPS>> <--lr-patience <N>|--lr-scale <SCALE>> <--plateau|--exponential>

For more information try --help

stickeritis / sticker

Make the exponential decay lr schedule available #147