The first larger round of hyperparameter studies here.
Short summary:
optimizers sgd, adamw, and adadelta tested w/ various (matching) parameters. Best scoring params for all will be looked further into
learning rates are explored in more detail (think optimal is roughly 1e-8, decreasing as complexity increases. Of course, epochs needs to be increased to see these results.
a best IMN config is started. should build similar for FgsaLSTM (and maybe BertHead).
The first larger round of hyperparameter studies here.
Short summary:
sgd
,adamw
, andadadelta
tested w/ various (matching) parameters. Best scoring params for all will be looked further into1e-8
, decreasing as complexity increases. Of course,epochs
needs to be increased to see these results.FgsaLSTM
(and maybeBertHead
).