Closed danieldk closed 5 years ago
gpu 1 on hopper is free
Inspecting the output, it seems to work correctly. @twuebi do you see a big difference in performance? For the default transformer it seems to be a few seconds faster per epoch with 1.14.0.
I briefly tried with the precompiled version of Tensorflow 1.15.0, it compiled more nodes, but no big improvement over 1.14.0.
Besides the difference between default with/without amp (1:49/50 vs. 1:56/57) I don't have any points for comparison. You could compare larger networks to see whether the difference gets more significant.
E.g. --activation relu --outer_hsize 384 --inner_hsize 4092 --keep_prob_inner 0.7 --keep_prob_outer 0.8 --keep_prob_attention 0.8 --keep_prob_input 0.9 --num_layers 6 --num_heads 8
It's also possible that it allows larger models to fit into memory.
Ah, you mentioned doing pretraining with mixed precision. I thought that maybe you had tried without as well to compare the ETAs.
I briefly tested pre-training without amp. The default config has similar gains as with the usual train (2:00m/epoch -> 1:50m/epoch; 9:40h/epoch -> 9:00h/epoch). Didn't try bigger ones without amp yet, also didn't do any profiling yet.
I didn't observe differences in accuracy between using AMP vs using no AMP.
This PR consists of three commits:
I did not have the opportunity to test this PR yet. Both GPUs on hopper are used. I am currently compiling Tensorflow on tesniere with the right compute capabilities.