Open ethansmith2000 opened 5 months ago
I see that the AMP automatic mixed precision within the deepspeed config is not compatible with Zero, but is that a hard limitation? as in if i were to go about manually casting everything would it work?
Deepspeed implements its own variation of AMP, so if you look in any integration libraries (Accelerate or HF Trainer) they skip torch's AMP in case of Deepspeed.
You shouldn't need to do any manual casting, just train w/o AMP.
I couldn't find any documentation on benchmarks around bf16/fp16 training, it caught me a bit off guard as i noticed the weights themselves are put in that precision which is different from the usual mixed precision schemes ive worked with.
I benchmarked on BingBertSquad example with the following config
args = { "seed": 42, "train_batch_size": 3, "gradient_accumulation_steps": 1, "do_lower_case": True, "bert_model": "bert-base-uncased", "dropout_p": 0.1, "train_file": "/efs/squad/train-v1.1.json", "predict_file": "/efs/squad/dev-v1.1.json", "num_train_epochs": 1, "output_dir": "/efs/squad/output", "max_seq_length": 384, "doc_stride": 128, "max_query_length": 64, "loss_plot_alpha": 0.9, "warmup_proportion": 0.1, "learning_rate": 3e-5, "print_steps": 100, "predict_batch_size": 8, "n_best_size": 20, "max_answer_length": 30, "verbose_logging": 1, "job_name": "squad", "max_steps": 99999999999, "max_steps_per_epoch": 99999999999, }
}`
only changing the bf16 on/off
these are the scores BF16 {"exact_match": 64.38032166508988, "f1": 74.72591312745307} FP32 {"exact_match": 77.09555345316934, "f1": 85.53713988174498} FP16 {“exact_match”: 77.12393566698202, “f1": 85.45339512076741}
I see that the AMP automatic mixed precision within the deepspeed config is not compatible with Zero, but is that a hard limitation? as in if i were to go about manually casting everything would it work?