This repo contains the scripts used to perform the experiments in this blog post. If you're using this code or our results, please cite it appropriately.
You will find
Run the script get_data.sh that will download and organize the data needed for the models
python train_cifar10.py 3e-3 --wd=0.1 --wd_loss=False
python train_cifar10.py 3e-3 --wd=0.1 --wd_loss=False --cyc_len=18 --tta=True
python fit_stanford_cars.py '(1e-2,3e-3)' --wd=1e-3 --tta=True
this should train an AWD LSTM to 68.7/65.5 perplexity without cache pointer, 52.9/50.9 with
python train_rnn.py 5e-3 --wd=1.2e-6 --alpha=3 --beta=1.5
this should train an AWD QRNN to 69.6/66.7 perplexity without cache pointer, 53.6/51.7 with
python train_rnn.py 5e-3 --wd=1e-6 --qrnn=True