[REQUEST] Support Deepspeed inference for Fairseq Transformer LM model

krishnanNuance commented 1 year ago

We would like to use Deepspeed Inference engine for drawing inference on Fairseq based Transformer LM model. Currently we run into error while loading the deepspeed trained checkpoint.

Describe the solution you'd like Support kernel injections for the Fairseq Transformer LM model trained using Deepspeed.

Describe alternatives you've considered We can also make modifications in this model architecture or implementation if it helps to get the transofmer_lm to work out of box with deepspeed inference in a similar way for gpt models

awan-10 commented 1 year ago

@krishnanNuance -- please let me know if you have a script that you are running this with?

For all models that we support for DS inference, we either do tensor-parallelism or we do kernel-injection or we do both.

For your case, I think you want to explore kernel-injection. I had an old branch that I was trying to make for fairseq kernel injection (https://github.com/microsoft/DeepSpeed/tree/fairseq-moe).

Do you mind trying that out?

krishnanNuance commented 1 year ago

hi @awan-10 , Sorry for the delay in response!

I was bit confused with the usage of params for deepspeed inference. I was able to get the following code working with deepspeed 0.7.4 today.

def main(cfg: FairseqConfig) -> None:
      params = hub_utils.from_pretrained(
              "../trial_v100_1GPU_amp/training/0/output",
              checkpoint_file="checkpoint_best.pt",
              data_name_or_path="../nseq_832/trial/preprocess/0"
          )

    # Initialize the DeepSpeed-Inference engine
    ds_engine = deepspeed.init_inference(params["models"][0],
                                    checkpoint=None,
                                    replace_with_kernel_inject=True)
    model = ds_engine.module
    src_token = torch.randint(0, 2697, (1, 10)).to("cuda")
    output = model(src_token)
    print("output:", output)

Here I tried to use the fairseq output checkpoint for inference with dummy data. I need to improve the code to save the state_dict while saving the deepspeed checkpoint. I'll try to use 'fairseq-moe' branch along with my changes to verify the inference and post here in case questions.

Thank you!

krishnanNuance commented 1 year ago

hi, 1) As discussed, I tried saving a deepspeed checkpoint along with state_dict. But when I try to load it via the checkpoint.json file as shown in the tutorial -https://www.deepspeed.ai/tutorials/inference-tutorial/. I get AssertionError: DeepSpeed checkpoint type is not supported

{
  "type": "DeepSpeed",
    "version": 0.7,
    "checkpoints": [],
    "checkpoint_path": "<path>/0/output/deepspeed/global_step236/zero_pp_rank_0_mp_rank_00_optim_states.pt"

Im using deepspeed version 0.7.4 2) when I try with fairseq_moe branch, I get compatibility issues with fairseq. Can you please tell which fairseq version to use?

Traceback (most recent call last):
return getattr(self.module, name)  File "/home/pooja_krishnan/.local/bin/fairseq-train", line 8, in <module>

  File "/opt/miniconda/envs/deepspeed/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1207, in __getattr__
    return getattr(self.module, name)
  File "/opt/miniconda/envs/deepspeed/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1207, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'LegacyDistributedDataParallel' object has no attribute 'max_positions'

krishnanNuance commented 1 year ago

Hello, I have one more question wrt to inference on torch checkpoint using deepspeed. Are the kernel injections (set using deepspeed.init_inference(model, replace_with_kernel_inject=True)) applied on torch based checkpoint as well? Can we expect better RTF on using torch checkpoint directly instead of a deepspeed checkpoint? Thank you!

yugaljain1999 commented 1 year ago

@krishnanNuance Were you be able to run deepspeed inference with fairseq checkpoint? Any leads would be appreciable. Thanks

microsoft / DeepSpeed

[REQUEST] Support Deepspeed inference for Fairseq Transformer LM model #3001