Closed Ch-rode closed 2 years ago
Hello @Ch-rode, I'm not facing any issues when using DeepSpeedCPUAdam, please provide us with a minimal script to run quickly and reproduce the issue.
I am using this seq2seq no trainer code from here with the only difference from the line 577 since I want to use a Bert2Bert model for a translation task (any advice is welcome!) :
if args.model_name_or_path:
vocabsize = 30
max_length = 512
encoder_config = BertConfig(vocab_size = vocabsize,
max_position_embeddings = max_length+64, # this shuold be some large value
num_attention_heads = 16,
max_length = 512,
num_hidden_layers = 30,
hidden_size = 1024,
type_vocab_size = 1,
).from_pretrained(args.model_name_or_path)
decoder_config = BertConfig(vocab_size = vocabsize,
max_position_embeddings = max_length+64, # this shuold be some large value
num_attention_heads = 16,
max_length = 512,
num_hidden_layers = 30,
hidden_size = 1024,
type_vocab_size = 1,
is_decoder=True,
add_cross_attention=True,
).from_pretrained(args.model_name_or_path) # Very Important
config = EncoderDecoderConfig.from_encoder_decoder_configs(encoder_config, decoder_config,decoder_start_token_id=tokenizer.pad_token_id,pad_token_id=tokenizer.pad_token_id)
model = AutoModelForSeq2SeqLM.from_config(config)
My deep accellerate config is the following:
{
"fp16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"weight_decay": "auto"
}
},
"scheduler": {
"type": "WarmupDecayLR",
"params": {
"warmup_min_lr": "auto",
"warmup_max_lr": "auto",
"warmup_num_steps": "auto",
"total_num_steps": "auto"
}
},
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"offload_param": {
"device": "cpu",
"pin_memory": true
},
"overlap_comm": true,
"contiguous_gradients": true,
"reduce_bucket_size": 5e8,
"stage3_prefetch_bucket_size": 5e8,
"stage3_param_persistence_threshold": 1e6,
"sub_group_size": 1e12,
"stage3_max_live_parameters": 1e9,
"stage3_max_reuse_distance": 1e9,
"stage3_gather_16bit_weights_on_model_save": "true"
},
"gradient_accumulation_steps": 1,
"gradient_clipping": "auto",
"steps_per_print": 2000,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}
Thank you very much
Actually, you edited out the error which is Exception: Installed CUDA version 11.7 does not match the version torch was compiled with 11.6, unable to compile cuda/cpp extensions without a matching cuda version.
. You need to match the CUDA version with the cuda version used for compiling the torch. Currently, PyTorch support only CUDA version 11.6, so you need to downgrade the system CUDA to 11.6. Earlier, I had got this error when the system version was 10.2
but the torch was compiled with 11.3
. So, by making both versions the same, the error should disappear.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hello ! I am trying to launch an
accelerate launch
but I am encounting those error:This is my env:
Thank you !!