microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
34.69k stars 4.04k forks source link

AssertionError: AutoTP not supported for model. Please use kernel injection since container policy for model exists. #3174

Open suri-kunal opened 1 year ago

suri-kunal commented 1 year ago

Describe the bug I am trying to finte tune a Transformer EncoderDecoder model on T4. I am using Longformer as Encoder and GPT2 as Decoder for this. While I am successfully able to train the model, while inference, I getting the following error -

AssertionError: AutoTP not supported for model. Please use kernel injection since container policy for model exists.

This is happening because I have set replace_with_kernel_inject=False in my init_inference function. If I set replace_with_kernel_inject=True I am getting #1301 error which might be because I am running my code on T4.

To Reproduce Steps to reproduce the behavior:

  1. Simple inference script to reproduce Training Loop -

    def training_loop(model_name, model, \
                  model_parameters, \
                  ds_config, \
                  train_dl, valid_dl):
    
    best_loss = np.inf
    best_model = None
    best_epoch = 0
    for epoch in tqdm(range(code_config.TASKA_SUMMARY_EPOCHS)):
        avg_train_loss, model = \
        train_summarization(model, \
                            model_parameters, \
                            ds_config, \
                            train_dl, \
                            epoch)
        if model.training is False:
            raise Exception("Model has to be trainable")
        new_loss = \
        validate_summarization(model,valid_dl,epoch)
        wandb.log({"Epoch/Training Loss":avg_train_loss, \
                   "Epoch/Validation Loss":new_loss, \
                   "Epoch/Epoch":epoch})
        if new_loss < best_loss:
            if model is None:
                raise Exception("Best Model cannot be none")
            best_loss = new_loss
    wandb.finish()

validate_summarization -

def validate_summarization(model,valid_dl,epoch=0):
    seed_everything(code_config.TASKA_SUMMARY_SEED)

    if model is None:
        raise Exception("Model cannot be None")

    world_size = int(os.getenv('WORLD_SIZE', '4'))

    model_engine_inference = deepspeed.init_inference(model,
                                                      mp_size=world_size,
                                                      dtype=torch.float,
                                                      replace_with_kernel_inject=False)

    model_engine_inference.eval()

    if model_engine_inference.training is True:
        raise Exception("Model should not be trainable")

    total_loss = 0
    for valid_step, valid_batch in enumerate(valid_dl):

        input_ids = valid_batch["input_ids"].to(device)
        attention_mask = valid_batch["attention_mask"].to(device)
        labels = valid_batch["labels"].to(device)
        decoder_input_ids = valid_batch["decoder_input_ids"].to(device)

        with torch.no_grad():
            output = model_engine_inference(input_ids=input_ids, \
                                            attention_mask=attention_mask, \
                                            decoder_input_ids=decoder_input_ids, \
                                            labels=labels, \
                                            use_cache=False, \
                                            return_dict=True)
            loss = output.loss
            total_loss += loss.item()

            wandb.log({'Batch/Validation Loss':loss.item(), \
                       'Batch/Validation Step':valid_step+epoch*len(valid_dl)})

    avg_loss = total_loss / len(valid_dl)

    return avg_loss

Stacktrace -

[2023-04-09 19:46:52,126] [INFO] [logging.py:93:log_dist] [Rank 0] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
  0%|          | 0/2 [05:35<?, ?it/s]
Traceback (most recent call last):
  File "Task A - Summarization - EncoderDecoder.py", line 550, in <module>
    training_loop(model_name, model, \
  File "Task A - Summarization - EncoderDecoder.py", line 482, in training_loop
    validate_summarization(model,valid_dl,epoch)
  File "Task A - Summarization - EncoderDecoder.py", line 287, in validate_summarization
    model_engine_inference = deepspeed.init_inference(model,
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/__init__.py", line 311, in init_inference
    engine = InferenceEngine(model, config=ds_inference_config)
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/inference/engine.py", line 139, in __init__
    parser_dict = AutoTP.tp_parser(model)
  File "/usr/local/lib/python3.8/dist-packages/deepspeed/module_inject/auto_tp.py", line 99, in tp_parser
    assert AutoTP.supported(model), "AutoTP not supported for model. Please use kernel injection since container policy for model exists." \
AssertionError: AutoTP not supported for model. Please use kernel injection since container policy for model exists.
  1. Latest versions of DeepSpeed and Huggingface
  2. How to run the script - NA
  3. ...

Expected behavior How do I get rid of this error? My target environment is K80 and so getting rid of this solution is extremely important to me.

ds_report output Please run ds_report to give us details about your setup.

Screenshots If applicable, add screenshots to help explain your problem.

System info (please complete the following information):

Docker context Dockerfile -

FROM nvcr.io/nvidia/pytorch:23.03-py3

WORKDIR /workspace

COPY requirements.txt .

RUN pip install -r requirements.txt

Requirements.txt

jupyter
protobuf<3.21.0a0,>=3.20.1
tornado<6.2,>=6.0.3
jupyterlab_widgets
ipywidgets
plotly
nltk
transformers
datasets
rouge-score
bert-score
evaluate
gputil
wandb
iterative-stratification
tensorflow
tf-slim
git+https://github.com/google-research/bleurt.git
captum
sentence-transformers
deepdiff
setfit
optuna
torch-lr-finder
openai
fire
tenacity
accelerate
loralib
deepspeed
peft

Additional context Add any other context about the problem here.

satpalsr commented 1 year ago

What if you try custom injection policy?

Example for GPT neox it would look like

from transformers import GPTNeoXLayer
pipe.model = deepspeed.init_inference(
        pipe.model,
        dtype=dtype,
        mp_size=args.world_size,
        replace_with_kernel_inject=False, 
        enable_cuda_graph=args.graphs,
        injection_policy= {GPTNeoXLayer: ('attention.dense','mlp.dense_4h_to_h')}
    )