[BUG] init_inference() cannot load GPT2 from checkpoint

Wenhan-Tan commented 1 year ago

I was trying to load GPT2 from checkpoint for inference but got NotImplementedError during replacing policy.

To Reproduce:

import torch
import json
from pathlib import Path
from transformers import AutoModelForCausalLM, AutoConfig

def get_checkpoint_files(model_size):
    cached_repo_dir = "./model/" + model_size
    file_list = [str(entry) for entry in Path(cached_repo_dir).rglob("*.[bp][it][n]") if entry.is_file()]
    return file_list

def write_checkpoints_json(model_size):
    checkpoint_files = get_checkpoint_files(model_size)
    data = {"type": "ds_model", "checkpoints": checkpoint_files, "version": 1.0}
    json.dump(data, open(model_size + "_checkpoints.json", "w"))

if __name__ == "__main__":
    # Save checkpoints
    model_id = "gpt2"
    model_config = AutoConfig.from_pretrained(pretrained_model_name_or_path=model_id)
    model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_id,
                                                 config=model_config
                                                 torch_dtype=torch.float16)
    model.save_pretrained("./model/" + model_size, from_pt=True)
    write_checkpoints_json(model_size)

    # Load checkpoints
    with deepspeed.OnDevice(dtype=torch.float16, device="meta"):
        model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path=model_id,
                                                     config=model_config,
                                                     torch_dtype=torch.float16)
    ds_config = {
        "base_dir": "./",
        "checkpoint": "1_5b" + "_checkpoints.json",
        "tensor_parallel": {"tp_size": 1},
        "dtype": "fp16",
        "replace_with_kernel_inject": True,
        "replace_method": "auto",
    }
    ds_model = deepspeed.init_inference(model=model, config=ds_config)

I ran the script with this command: deepspeed --num_gpus 1 script.py

And got the following error:

Traceback (most recent call last):
  File "gpt_DS_benchmark.py", line 113, in <module>
    ds_model = deepspeed.init_inference(model=model, config=ds_config)
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/__init__.py", line 311, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 124, in __init__
    self._apply_injection_policy(config)
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 349, in _apply_injection_policy
    replace_transformer_layer(client_module,
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 932, in replace_transformer_layer
    param_names=selected_policy_g.get_param_names(),
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_policy.py", line 171, in get_param_names
    raise NotImplementedError
NotImplementedError

Then I used the same script but changed the model to GPTJ and it executed successfully.

I looked at the source file here: https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/module_inject/replace_policy.py#L487 And found out that HFGPT2LayerPolicy does not have get_param_names() being implemented. I checked other policies and realized this get_param_names() is implemented only for a few models. Is there a reason why GPT2 doesn't have it?

Then I tried to implement it for GPT2 by myself like this:

def get_param_names(self):
    return 'attn.c_attn.weight', \
            'attn.c_attn.bias', \
            'attn.c_proj.weight', \
            'attn.c_proj.bias', \
            'mlp.c_fc.weight', \
            'mlp.c_fc.bias', \
            'mlp.c_proj.weight', \
            'mlp.c_proj.bias', \
            'ln_2.weight', \
            'ln_2.bias', \
            'ln_1.weight', \
            'ln_1.bias', \
            self.use_load_prefix, \
            self.split_qkv

But got another error below:

Traceback (most recent call last):
  File "gpt_DS_benchmark.py", line 113, in <module>
    ds_model = deepspeed.init_inference(model=model, config=ds_config)
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/__init__.py", line 311, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 124, in __init__
    self._apply_injection_policy(config)
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/inference/engine.py", line 349, in _apply_injection_policy
    replace_transformer_layer(client_module,
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 926, in replace_transformer_layer
    load_model_with_checkpoint(
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 349, in load_model_with_checkpoint
    load_module_recursive(r_module)
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 343, in load_module_recursive
    load_module_recursive(
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 343, in load_module_recursive
    load_module_recursive(
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 341, in load_module_recursive
    layer_policies[child.__class__](child, prefix + name + '.')
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 238, in load_transformer_layer
    maybe_copy(module.attention,
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/module_inject/load_checkpoint.py", line 187, in maybe_copy
    dst = weight_quantizer.quantize(mp_replace.qkv_copy(dst, tmp if weight_quantizer.q_int8 else \
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 53, in qkv_copy
    self.merge_assert(src_shape[self.out_dim], dst_shape[self.out_dim])
  File "/home/wenhant/.local/lib/python3.8/site-packages/deepspeed/module_inject/replace_module.py", line 32, in merge_assert
    assert dim1 > dim2, \
AssertionError: Merging tensors is not allowed here! Please use deepspeed load_checkpoint            for merging your checkpoints before replacing the transformer layer with            inference-kernels

Not sure what I can do next. Can someone help me even if it's a temporary solution? Thanks!

lekurile commented 1 year ago

Hi @Wenhan-Tan,

Generic checkpoint loading has been implemented for only a subset of models in GH-2547:

GPT-J
GPT-Neo
GPT-NeoX
OPT
BLOOM

The reasoning behind not implementing GPT2 checkpoint loading is given that it's a smaller model, it'll not benefit as dramatically from this feature.

Thanks for raising this issue, we'll work through several things to help with this in the future:

Create more explicit error messages about whether meta tensor checkpoint loading is supported for a model or not.
Consider expanding the feature across more models, although this is not certain and requires more investigation.

Wenhan-Tan commented 1 year ago

Hi @lekurile ,

Thank you for replying! I understand GPT2 is a smaller model. I'm trying to use a larger version of GPT2 like GPT2-xl which has 1.5B parameters. This model will benefit from the feature. If later GPT3 is released, it has 175B parameters. This feature will be really useful as well. Do you have a timeline of when GPT will be supported? If not, please let me know what else I can do to make this work.

Thanks a lot!

lekurile commented 1 year ago

Hi @Wenhan-Tan,

I've completed a PR (GH-2792) adding explicit error reporting in cases where meta tensor checkpoint loading is attempted on models that don't support the feature.

As far as GPT2 support goes, we don't have immediate plans, but if/when larger GPT variants are released, we'd prioritize adding support for this. One thing to bear in mind is that the meta tensor approach is specifically targeting loading models that cannot fit on a single GPU without tensor parallelism. We're not aware of GPT2 models that currently have that limitation.

However, we certainly encourage you to feel free to add a PR adding meta tensor support for GPT2 and appreciate any efforts to extend the support/functionality of DeepSpeed. 😃

I'll mark the issue as resolved in the meantime.

Thanks, Lev

microsoft / DeepSpeed

[BUG] init_inference() cannot load GPT2 from checkpoint #2691