Currently learning rates and regularization values are specified via a config file, and in fact this is the only use of the config file. The proposal is to move away from a config file and towards command-line parameters. To ease this, and for better automation in general, the user should only have to specify the min and max learning rates to explore, and we should use binary search to find the optimal learning rate (we may hit a local optimum, but this is acceptable given the best-effort nature of the script). Bonus points: do this without any input, and scale the learning rate range to match the range of rewards seen in the data.
Similarly, automate the sweep of regularization parameters.
Currently learning rates and regularization values are specified via a config file, and in fact this is the only use of the config file. The proposal is to move away from a config file and towards command-line parameters. To ease this, and for better automation in general, the user should only have to specify the min and max learning rates to explore, and we should use binary search to find the optimal learning rate (we may hit a local optimum, but this is acceptable given the best-effort nature of the script). Bonus points: do this without any input, and scale the learning rate range to match the range of rewards seen in the data.
Similarly, automate the sweep of regularization parameters.