Open twuebi opened 5 years ago
No more things in the configuration file, there is too much stuff in there already that is only relevant to training. Maybe clap
offers some functionality to only unveil options based on the value of some other option?
There is requires
, not sure if it hides an option if it is not present:
https://kbknapp.github.io/clap-rs/clap/struct.Arg.html#method.requires
maybe group
in conjunction with requires
Yep. It's worth trying if they get hidden if the requirement is not given. But I guess at the very least it would also group the arguments together in usage information? (Which would go a long way of not making it to confusing.)
https://kbknapp.github.io/clap-rs/clap/struct.ArgGroup.html
You can also do things such as name an ArgGroup as a confliction or requirement, meaning any of the arguments that belong to that group will cause a failure if present, or must present respectively. Perhaps the most common use of ArgGroups is to require one and only one argument to be present out of a given set. Imagine that you had multiple arguments, and you want one of them to be required, but making all of them required isn't feasible because perhaps they conflict with each other. For example, lets say that you were building an application where one could set a given version number by supplying a string with an option argument, i.e. --set-ver v1.2.3, you also wanted to support automatically using a previous version number and simply incrementing one of the three numbers. So you create three flags --major, --minor, and --patch. All of these arguments shouldn't be used at one time but you want to specify that at least one of them is used. For this, you can create a group.
https://kbknapp.github.io/clap-rs/clap/struct.App.html#method.arg_group
I looked a bit further into it, not yet happy with it.
Examples are below.
For the grouping in the help to work, we need to set AppSettings::DeriveDisplayOrder
and hide_default_value(true)
on every argument that has a default value and should be followed by a newline. This is necessary since appending "\n " to the preceeding help message was the only way to introduce a blank line to get a visual grouping (https://github.com/clap-rs/clap/issues/1250). Arg::conflicts_with
also conflicts with Arg::default_value
, Arg::default_value_if
can be used to get a conditional default value.
Grouping:
sticker-train 0.10.0
Train a sticker model
USAGE:
sticker train [OPTIONS] <CONFIG> <TRAIN_DATA> <VALIDATION_DATA>
OPTIONS:
--batchsize <BATCH_SIZE> Batch size [default: 256]
--continue <PARAMS> Continue training from parameter files (e.g.: epoch-50)
--lr <LR> Initial learning rate [default: 0.01]
--warmup <N> For the first N timesteps, the learning rate is linearly scaled up to LR.
--plateau Plateau learning rate schedule
--lr-patience <N> Scale learning rate after N epochs without improvement
--lr-scale <SCALE> Value to scale the learning rate by
--exponential Exponential learning rate schedule
--decay-rate <RATE> coefficient of the exponential decay
--decay-steps <STEPS> global_step / steps is the exponent of the decay_rate
--maxlen <N> Ignore sentences longer than N tokens
--shuffle_buffer <N> Size of the buffer used for shuffling.
--patience <N> Maximum number of epochs without improvement [default: 15]
--logdir <LOGDIR> Write Tensorboard summaries to this directory.
-h, --help Prints help information
-V, --version Prints version information
ARGS:
<CONFIG> Sticker configuration
<TRAIN_DATA> Training data
<VALIDATION_DATA> Validation data
No grouping:
sticker-train 0.10.0
Train a sticker model
USAGE:
sticker train <CONFIG> <TRAIN_DATA> <VALIDATION_DATA> <--plateau|--exponential>
OPTIONS:
--batchsize <BATCH_SIZE> Batch size [default: 256]
--continue <PARAMS> Continue training from parameter files (e.g.: epoch-50)
--lr <LR> Initial learning rate [default: 0.01]
--warmup <N> For the first N timesteps, the learning rate is linearly scaled up to LR. [default:
0]
--plateau Plateau learning rate schedule
--lr-patience <N> Scale learning rate after N epochs without improvement
--lr-scale <SCALE> Value to scale the learning rate by
--exponential Exponential learning rate schedule
--decay-rate <RATE> coefficient of the exponential decay
--decay-steps <STEPS> global_step / steps is the exponent of the decay_rate
--maxlen <N> Ignore sentences longer than N tokens
--shuffle_buffer <N> Size of the buffer used for shuffling.
--patience <N> Maximum number of epochs without improvement [default: 15]
--logdir <LOGDIR> Write Tensorboard summaries to this directory.
-h, --help Prints help information
-V, --version Prints version information
ARGS:
<CONFIG> Sticker configuration
<TRAIN_DATA> Training data
<VALIDATION_DATA> Validation data
Making the args mutually exclusive works via conflicts_with
which can be specified on ArgGroup
as well as Arg
. Setting ArgGroup::multiple
to true
allows multiple values from the same group, per default it is false
which means only one value from a group can be present.
.group(ArgGroup::with_name(SCHEDULE_GROUP).required(true))
.arg(
Arg::with_name(PLATEAU)
.long("plateau")
.help("Plateau learning rate schedule")
.group(SCHEDULE_GROUP)
.requires(PLATEAU_GROUP),
)
.group(
ArgGroup::with_name(PLATEAU_GROUP)
.multiple(true)
.conflicts_with_all(&[EXPONENTIAL, EXPONENTIAL_GROUP])
)
.arg(
Arg::with_name(LR_PATIENCE)
.long("lr-patience")
.value_name("N")
.help("Scale learning rate after N epochs without improvement")
.group(PLATEAU_GROUP)
.default_value_if(PLATEAU, None, "5"),
)
.arg(
Arg::with_name(LR_SCALE)
.long("lr-scale")
.value_name("SCALE")
.help("Value to scale the learning rate by")
.group(PLATEAU_GROUP)
.default_value_if(PLATEAU, None, "0.5"),
)
.arg(
Arg::with_name(EXPONENTIAL)
.long("exponential")
.help("Exponential learning rate schedule")
.group(SCHEDULE_GROUP)
.requires(EXPONENTIAL_GROUP),
)
.group(
ArgGroup::with_name(EXPONENTIAL_GROUP)
.multiple(true)
.conflicts_with_all(&[PLATEAU, PLATEAU_GROUP])
)
.arg(
Arg::with_name(DECAY_RATE)
.long("decay-rate")
.value_name("RATE")
.help("coefficient of the exponential decay")
.group(EXPONENTIAL_GROUP)
.default_value_if(EXPONENTIAL, None, "0.998"),
)
.arg(
Arg::with_name(DECAY_STEPS)
.long("decay-steps")
.value_name("STEPS")
.help("global_step / steps is the exponent of the decay_rate")
.group(EXPONENTIAL_GROUP)
.default_value_if(EXPONENTIAL, None, "100"),
)
The error messages we're getting are sometimes helpful:
$ ./target/release/sticker train dep.conf ger/train.conll ger/dev.conll
error: The following required arguments were not provided:
<--plateau|--exponential>
USAGE:
sticker train <CONFIG> <TRAIN_DATA> <VALIDATION_DATA> --batchsize <BATCH_SIZE> --lr <LR> --patience <N> --warmup <N> <--plateau|--exponential>
For more information try --help
Sometimes not so much:
$ ./target/release/sticker train dep.conf train.conll dev.conll --exponential --lr-patience 5 --lr-scale 0.3
error: The argument '--exponential' cannot be used with one or more of the other specified arguments
USAGE:
sticker train <CONFIG> <TRAIN_DATA> <VALIDATION_DATA> --batchsize <BATCH_SIZE> --lr <LR> --patience <N> --warmup <N> <--decay-rate <RATE>|--decay-steps <STEPS>> <--lr-patience <N>|--lr-scale <SCALE>> <--plateau|--exponential>
For more information try --help
./target/release/sticker train dep.conf ger/train.conll ger/dev.conll --exponential --decay-rate 5 --lr-scale 0.3
error: The argument '--lr-scale <SCALE>' cannot be used with one or more of the other specified arguments
USAGE:
sticker train <CONFIG> <TRAIN_DATA> <VALIDATION_DATA> --batchsize <BATCH_SIZE> --lr <LR> --patience <N> --warmup <N> <--decay-rate <RATE>|--decay-steps <STEPS>> <--lr-patience <N>|--lr-scale <SCALE>> <--plateau|--exponential>
For more information try --help
Right now, the exponential decay lr schedule is not available for
sticker train
andsticker pretrain
. Once #145 is merged, it would make sense to have it available for both subcommands.This may clutter the command line arguments a bit since we then have:
Plateau decay
Exponential decay
Maybe it would make sense to move the learning-rate schedule related things to the config file.