Closed PotentialX closed 2 months ago
val_results.txt We train with batch size 256. Here is the our training record file.
We first start with lr=6e-4, the training code will decrease it to 1e-4 gradually. Then set the lr=6e-4 again, wait it decrease to 1e-4 gradually. Then set the lr=4e-4, wait it decrease to 5e-5. Then set the lr=3e-4, wait it decrease to 5e-5. Then set the lr=2e-4, wait it decrease to 5e-5.
Unfortunately, You have to manually reset the learning rate after some period.
bravo!
great work! but i found it is really unstable and difficult to reproduct the result the same as yours.