microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
33.78k stars 3.96k forks source link

[BUG] deepspeed.init_inference() erroneously attempts to copy out of meta tensor #3012

Open larekrow opened 1 year ago

larekrow commented 1 year ago

The bug In deepspeed.module_inject.replace_module.py, replace_module() is being called on meta tensors before the actual weights are loaded just a few lines below, resulting in NotImplementedError: Cannot copy out of meta tensor; no data! error.

The code excerpt

config = AutoConfig.from_pretrained(MODEL_NAME)
with deepspeed.OnDevice(dtype=torch.float16, device='meta'):
    model = AutoModel.from_config(config, torch_dtype=torch.float16)
ds_inference_config = {
    'tensor_parallel': {'tp_size': 2},
    'dtype': torch.float16,
    'checkpoint': checkpoint_json,
    'kernel_inject': True,
}
ds_engine = deepspeed.init_inference(
    model,
    config=ds_inference_config,
)

checkpoint_json

{
    "type": "Megatron",
    "version": 1.0,
    "checkpoints": [
        "/home/user/models/opt_iml_30b/max/checkpoint_1_6000.pt-model_part-0.pt",
        "/home/user/models/opt_iml_30b/max/checkpoint_1_6000.pt-model_part-1.pt"
    ]
}

The error

Traceback (most recent call last):
  File "src/test_tp.py", line 87, in <module>
    init_tokenizer_model_deepspeed_w_TP('config/checkpoints-opt-iml-30b-max.json')
  File "src/test_tp.py", line 57, in init_tokenizer_model_deepspeed_w_TP
    ds_engine = deepspeed.init_inference(
  File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/__init__.py", line 311, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config, metaseq_opt_to_pt=metaseq_opt_to_pt)
  File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 145, in __init__
    self._apply_injection_policy(config)
  File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 372, in _apply_injection_policy
    replace_transformer_layer(client_module,
  File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 534, in replace_transformer_layer
    replaced_module = replace_module(model=model,
  File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 800, in replace_module
    replaced_module, _ = _replace_module(model, policy)
  File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 827, in _replace_module
    _, layer_id = _replace_module(child, policies, layer_id=layer_id)
  File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 827, in _replace_module
    _, layer_id = _replace_module(child, policies, layer_id=layer_id)
  File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 817, in _replace_module
    replaced_module = policies[child.__class__][0](child,
  File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 524, in replace_fn
    new_module = replace_with_policy(child,
  File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 388, in replace_with_policy
    _container.transpose()
  File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/containers/features/meta_tensor.py", line 35, in transpose
    super().transpose()
  File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/containers/base.py", line 232, in transpose
    self.transpose_mlp()
  File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/containers/base.py", line 241, in transpose_mlp
    self._h4h_w = self.transpose_impl(self._h4h_w.data)
  File "/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed/module_inject/containers/base.py", line 251, in transpose_impl
    data.to(get_accelerator().current_device_name())
NotImplementedError: Cannot copy out of meta tensor; no data!

Some line numbers in the trackback may be inaccurate due to incorporating changes from GH-2940 and my own code.

ds_report

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/torch']
torch version .................... 1.12.1+cu113
deepspeed install path ........... ['/home/user/workspace/deepspeed_llm/venv/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.8.2, unknown, unknown
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 11.3
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3

System info

Additional context I am trying to load OPT-IML-30B downloaded as 2 TPs from Metaseq before I move on to OPT-IML-175B which has 16 TPs.

Please advise on how to proceed, thank you!

qtli commented 1 year ago

I also encounter this issue. I run inference_test.py to load OPT-IML-30B downloaded from Huggingface.

larekrow commented 1 year ago

Hi, any updates?

qtli commented 1 year ago

I could successfully run the script. I first saved the sharded checkpoints to a custom directory and then load the sharded ones for inference (This 2379 helps me a lot! ).

Maybe you could try to set arg replace_method as auto.

larekrow commented 1 year ago

Hi @qtli, appreciate the suggestion, but I did not use 'replace_method': 'auto' following PR #2831. I did try to run it again upon your suggestion for good measure though -- same error. I also did not use the method to obtain tensor parallels from HuggingFace weights as avoiding HF is the goal (since 175B is not on HF). I want to use metaseq OPT-IML TPs directly.

This error is encountered when 'checkpoint': checkpoint_json is used, 'replace_with_kernel_inject': True and isinstance(self.module, torch.nn.Module) == True. Not sure if tp_size > 1 contributes to the condition that encounters this error.

This could be related to #2616 but I am not sure. I circumvented the state_dict issues by adding custom code in _load_checkpoint() in engine.py:

def _metaseq_opt_to_pt(sd):
    keys_to_delete = [
        "decoder.version",
    ]
    for key in keys_to_delete:
        if key in sd:
            sd.pop(key)

    keys_to_rename = {
        "decoder.layer_norm.weight": "decoder.final_layer_norm.weight",
        "decoder.layer_norm.bias": "decoder.final_layer_norm.bias",
    }
    for old_key, new_key in keys_to_rename.items():
        if old_key in sd:
            sd[new_key] = sd.pop(old_key)

    for key in list(sd.keys()):
        if ".qkv_proj." in key:
            q_name = key.replace(".qkv_proj.", ".q_proj.")
            k_name = key.replace(".qkv_proj.", ".k_proj.")
            v_name = key.replace(".qkv_proj.", ".v_proj.")

            value = sd[key]
            depth = value.shape[0]
            assert depth % 3 == 0
            # `SequeuceParallelTransformerBlock` has QKV weight is separated in K,V,Q despite the naming:
            # https://cs.github.com/facebookresearch/metaseq/blob/51871bd73cd04c038f239ea2a26db1d7f6b37927/metaseq/modules/sequence_parallel_transformer_layer.py#L97
            k, v, q = torch.split(value, depth // 3, dim=0)

            sd[q_name] = q
            sd[k_name] = k
            sd[v_name] = v
            del sd[key]

    return sd
...

checkpoint[self._choose_module_key(checkpoint)] = _metaseq_opt_to_pt(checkpoint[self._choose_module_key(checkpoint)])

self.module.load_state_dict(
    state_dict=checkpoint[self._choose_module_key(checkpoint)],
    strict=load_module_strict)

Would appreciate any help or suggestion.

molohov commented 1 year ago

I am seeing this error too. DeepSpeed version 0.9.2

config = AutoConfig.from_pretrained(model_name)
with deepspeed.OnDevice(dtype=dtype, device="meta"):
    model = AutoModelForCausalLM.from_config(config)
model = deepspeed.init_inference(
    model,
    tensor_parallel = tp_config,
    base_dir=repo_root,
    replace_with_kernel_inject=args.kernel_injection,
    **kwargs
)

With replace_with_kernel_inject = False, I get this error:

    model = deepspeed.init_inference(                                                              
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/__init__.py", line 333, in init_inference                                                                                     
    engine = InferenceEngine(model, config=ds_inference_config)                                    
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 204, in __init__                                                                                   
    self._apply_injection_policy(config, client_module)                                            
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 396, in _apply_injection_policy                                                                    
    replace_transformer_layer(client_module, self.module, checkpoint, config, self.config)         
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 494, in replace_transformer_layer                                                      
    replaced_module = replace_module(model=model,                                                  
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 727, in replace_module                                                                 
    replaced_module, _ = _replace_module(model, policy)                                            
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 752, in _replace_module                                                                
    _, layer_id = _replace_module(child, policies, layer_id=layer_id)                              
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 752, in _replace_module                                                                
    _, layer_id = _replace_module(child, policies, layer_id=layer_id)                              
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 752, in _replace_module                                                                
    _, layer_id = _replace_module(child, policies, layer_id=layer_id)                              
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 744, in _replace_module                                                                
    replaced_module = policies[child.__class__][0](child, policies[child.__class__][-1], layer_id) 
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 490, in replace_fn                                                                     
    new_module = replace_wo_policy(child, _policy)                                                 
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 473, in replace_wo_policy                                                              
    return _replace_module(module)                                                                 
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 470, in _replace_module                                                                
    _replace_module(child, name)                                                                   
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 466, in _replace_module                                                                
    setattr(r_module, name, linear_policies[child.__class__](child, prev_name + '.' + name,        
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 392, in _replace                                                                       
    data = mp_replace.copy(new_weight, child.weight.data)                                          
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/module_inject/replace_module.py", line 89, in copy                                                                            
    assert not dst.data.is_meta  # the torch.Tensor.copy_ method used below will silently fail on meta tensors

With replace_with_kernel_inject = True:

  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/__init__.py", line 333, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/deepspeed/inference/engine.py", line 207, in __init__
    self.module.to(device)
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1896, in to
    return super().to(*args, **kwargs)
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 927, in to
    return self._apply(convert)
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 602, in _apply
    param_applied = fn(param)
  File "/opt/conda/envs/inference/lib/python3.9/site-packages/torch/nn/modules/module.py", line 925, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!

Does this depend on what weights are being loaded? I am running OPT from hugging-face.

testpppppp commented 1 year ago

@molohov Hi bro, have u solved this problem

yingying123321 commented 1 year ago

the same issue here

xuxingya commented 1 year ago

The same issue

molohov commented 1 year ago

I had some success loading the model this way:

with deepspeed.OnDevice(dtype=dtype, device="meta"):
    model = AutoModelForCausalLM.from_pretrained(model_name, low_cpu_mem_usage=True)
model = deepspeed.init_inference(
    model,
    tensor_parallel = tp_config,
    base_dir=repo_root,
    replace_with_kernel_inject=args.kernel_injection,
    **kwargs
)

I think this is because low_cpu_mem_usage=True initializes the HF model with meta tensors for you, allowing DS to copy it correctly.

andre-bauer commented 10 months ago

I had some success loading the model this way:

with deepspeed.OnDevice(dtype=dtype, device="meta"):
    model = AutoModelForCausalLM.from_pretrained(model_name, low_cpu_mem_usage=True)
model = deepspeed.init_inference(
    model,
    tensor_parallel = tp_config,
    base_dir=repo_root,
    replace_with_kernel_inject=args.kernel_injection,
    **kwargs
)

I think this is because low_cpu_mem_usage=True initializes the HF model with meta tensors for you, allowing DS to copy it correctly.

Does this really work for anyone ? With OPT this fails for me