microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
34.91k stars 4.06k forks source link

self.qkv_gemm_func returns ValueError: The deleter and context arguments are mutually exclusive. #3284

Open publicstaticvo opened 1 year ago

publicstaticvo commented 1 year ago

Describe the bug I am getting the following error while attempting to run deepspeed-chat step 3 with the actor model CarperAI/openai_summarize_tldr_sft (gpt-j 6B) and critic model CarperAI/openai_summarize_tldr_rm_checkpoint (gpt-j 6B) and ZeRO stage level 2.

Traceback (most recent call last): File "main.py", line 523, in main() File "main.py", line 430, in main out = trainer.generate_experience(prompts) File "/data/nt12_ssd_gluster/myself/yts/dc/training/step3_rlhf_finetuning/ppo_trainer.py", line 97, in generate_experience seq = self._generate_sequence(prompts) File "/data/nt12_ssd_gluster/myself/yts/dc/training/step3_rlhf_finetuning/ppo_trainer.py", line 75, in _generate_sequence seq = self.actor_model.module.generate(prompts, max_length=max_min_length, min_length=max_min_length) File "/data/nt12_ssd_gluster/myself/yts/dc/training/step1_supervised_finetuning/DeepSpeed/deepspeed/runtime/hybrid_engine.py", line 254, in generate generate_ret_vals = self._generate(*inputs, kwargs) File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 1437, in generate return self.greedy_search( File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/transformers/generation/utils.py", line 2248, in greedy_search outputs = self( File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/transformers/models/gptj/modeling_gptj.py", line 852, in forward transformer_outputs = self.transformer( File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/transformers/models/gptj/modeling_gptj.py", line 687, in forward outputs = block( File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, kwargs) File "/data/nt12_ssd_gluster/myself/yts/dc/training/step1_supervised_finetuning/DeepSpeed/deepspeed/model_implementations/transformers/ds_transformer.py", line 147, in forward self.attention(input,
File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/data/nt12_ssd_gluster/myself/yts/dc/training/step1_supervised_finetuning/DeepSpeed/deepspeed/ops/transformer/inference/ds_attention.py", line 152, in forward qkv_out = self.qkv_func(input=input, File "/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input,
kwargs)
File "/data/nt12_ssd_gluster/myself/yts/dc/training/step1_supervised_finetuning/DeepSpeed/deepspeed/ops/transformer/inference/op_binding/qkv_gemm.py", line 35, in forward output = self.qkv_gemm_func(input, weight, q_scale, bias, gamma, beta, self.config.epsilon, add_bias, ValueError: The deleter and context arguments are mutually exclusive.

ds_report output


DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

async_io ............... [YES] ...... [OKAY] cpu_adagrad ............ [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] quantizer .............. [YES] ...... [OKAY] random_ltd ............. [YES] ...... [OKAY] sparse_attn ............ [YES] ...... [OKAY] spatial_inference ...... [YES] ...... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] transformer_inference .. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY]

DeepSpeed general environment info: torch install path ............... ['/data/nt12_ssd_gluster/myself/miniconda3/lib/python3.8/site-packages/torch'] torch version .................... 1.10.0+cu113 deepspeed install path ........... ['/data/nt12_ssd_gluster/myself/yts/dc/training/step1_supervised_finetuning/DeepSpeed/deepspeed'] deepspeed info ................... 0.9.1+cc67f22f, cc67f22f, master torch cuda version ............... 11.3 torch hip version ................ None nvcc version ..................... 11.3 deepspeed wheel compiled w. ...... torch 1.10, cuda 11.3

System info (please complete the following information):

Additional context I would like to know if the pull request in https://github.com/microsoft/DeepSpeed/pull/3256 or some similar fixes can help with this issue.

cmikeh2 commented 1 year ago

Hi @publicstaticvo, thank you for reporting this issue. Currently, the Hybrid Engine is only supported for the OPT family of models, but additional model support (including GPT-J) is on our roadmap and in development. I will make sure to update this issue here when support for GPT-J has been added and validated. Thanks!