huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.71k stars 937 forks source link

AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_ada` #552

Closed Ch-rode closed 2 years ago

Ch-rode commented 2 years ago

Hello ! I am trying to launch an accelerate launch but I am encounting those error:

Exception ignored in: <function DeepSpeedCPUAdam.__del__ at 0x7f120f45ed30> 
Traceback (most recent call last):                    
File "/home/rodelc/encoder_decoder/seq2seq_temp/lib64/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py",line 97,  in __del__ self.ds_opt_adam.destroy_adam(self.opt_id)                                                                             
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_ada

This is my env:

accelerate==0.11.0                                                                                                        
aiohttp==3.8.1                                                                                                            
aiosignal==1.2.0                                                                                                          
async-timeout==4.0.2                                                                                                      
attrs==21.4.0                                                                                                             
certifi==2022.6.15                                                                                                        
charset-normalizer==2.1.0                                                                                                 
datasets==2.3.2                                                                                                           
deepspeed==0.6.5                                                                                                          
dill==0.3.5.1                                                                                                             
filelock==3.7.1                                                                                                           
frozenlist==1.3.0      
fsspec==2022.5.0
hjson==3.0.2
huggingface-hub==0.8.1
idna==3.3
multidict==6.0.2
multiprocess==0.70.13
ninja==1.10.2.3
numpy==1.23.1
packaging==21.3
pandas==1.4.3
Pillow==9.2.0
psutil==5.9.1
py-cpuinfo==8.0.0
pyarrow==8.0.0
pydantic==1.9.1
pyparsing==3.0.9
python-dateutil==2.8.2
pytz==2022.1
PyYAML==6.0
regex==2022.7.9
requests==2.28.1
responses==0.18.0
six==1.16.0
tokenizers==0.12.1
torch==1.12.0+cu116
torchaudio==0.12.0+cu116
torchvision==0.13.0+cu116
tqdm==4.64.0
transformers==4.20.1
typing_extensions==4.3.0
urllib3==1.26.10
xxhash==3.0.0
yarl==1.7.2

Thank you !!

pacman100 commented 2 years ago

Hello @Ch-rode, I'm not facing any issues when using DeepSpeedCPUAdam, please provide us with a minimal script to run quickly and reproduce the issue.

Ch-rode commented 2 years ago

I am using this seq2seq no trainer code from here with the only difference from the line 577 since I want to use a Bert2Bert model for a translation task (any advice is welcome!) :


    if args.model_name_or_path:

        vocabsize = 30
        max_length = 512

        encoder_config = BertConfig(vocab_size = vocabsize,
                            max_position_embeddings = max_length+64, # this shuold be some large value
                            num_attention_heads = 16,
                            max_length = 512,
                            num_hidden_layers = 30,
                            hidden_size = 1024,
                            type_vocab_size = 1,
                            ).from_pretrained(args.model_name_or_path)

        decoder_config = BertConfig(vocab_size = vocabsize,
                            max_position_embeddings = max_length+64, # this shuold be some large value
                            num_attention_heads = 16,
                            max_length = 512,
                            num_hidden_layers = 30,
                            hidden_size = 1024,
                            type_vocab_size = 1,
                            is_decoder=True,
                            add_cross_attention=True,
                            ).from_pretrained(args.model_name_or_path)  # Very Important

        config = EncoderDecoderConfig.from_encoder_decoder_configs(encoder_config, decoder_config,decoder_start_token_id=tokenizer.pad_token_id,pad_token_id=tokenizer.pad_token_id)

        model = AutoModelForSeq2SeqLM.from_config(config)

My deep accellerate config is the following:

{
    "fp16": {
        "enabled": true,
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 16,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "optimizer": {
        "type": "AdamW",
        "params": {
            "lr": "auto",
            "weight_decay": "auto"
        }
    },
    "scheduler": {
        "type": "WarmupDecayLR",
        "params": {
            "warmup_min_lr": "auto",
            "warmup_max_lr": "auto",
            "warmup_num_steps": "auto",
            "total_num_steps": "auto"
        }
     },
    "zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": true
        },
        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        },
        "overlap_comm": true,
        "contiguous_gradients": true,
        "reduce_bucket_size": 5e8,
        "stage3_prefetch_bucket_size": 5e8,
        "stage3_param_persistence_threshold": 1e6,
        "sub_group_size": 1e12,
        "stage3_max_live_parameters": 1e9,
        "stage3_max_reuse_distance": 1e9,
        "stage3_gather_16bit_weights_on_model_save": "true"
    },
    "gradient_accumulation_steps": 1,
    "gradient_clipping": "auto",
    "steps_per_print": 2000,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

Thank you very much

pacman100 commented 2 years ago

Actually, you edited out the error which is Exception: Installed CUDA version 11.7 does not match the version torch was compiled with 11.6, unable to compile cuda/cpp extensions without a matching cuda version.. You need to match the CUDA version with the cuda version used for compiling the torch. Currently, PyTorch support only CUDA version 11.6, so you need to downgrade the system CUDA to 11.6. Earlier, I had got this error when the system version was 10.2 but the torch was compiled with 11.3. So, by making both versions the same, the error should disappear.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.