[BUG] use bloomz + hybrid_engine, but AttributeError: 'DS_BloomContainer' object has no attribute 'set_params_wo_copy'

shenzhuo commented 1 year ago

Describe the bug When use hybrid_engine + bloomz, zero2. An error was reported, it seems to tell me that bloomz does not support hybrid_engine

Log output

Traceback (most recent call last):
  File "DeepSpeedRLHF/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 634, in <module>
    main()
  File "DeepSpeedRLHF/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 428, in main
    rlhf_engine = DeepSpeedRLHFEngine(
  File "DeepSpeedRLHF/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/rlhf_engine.py", line 54, in __init__
    self.actor = self._init_actor(
  File "DeepSpeedRLHF/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/rlhf_engine.py", line 122, in _init_actor
    actor_engine, *_ = deepspeed.initialize(model=actor_model,
  File "venv/lib/python3.9/site-packages/deepspeed/__init__.py", line 153, in initialize
    engine = DeepSpeedHybridEngine(args=args,
  File "venv/lib/python3.9/site-packages/deepspeed/runtime/hybrid_engine.py", line 52, in __init__
    self.create_inference_module()
  File "venv/lib/python3.9/site-packages/deepspeed/runtime/hybrid_engine.py", line 359, in create_inference_module
    self.create_inference_containers(self.module)
  File "venv/lib/python3.9/site-packages/deepspeed/runtime/hybrid_engine.py", line 308, in create_inference_containers
    self.create_inference_containers(child, layer_id=layer_id)
  File "venv/lib/python3.9/site-packages/deepspeed/runtime/hybrid_engine.py", line 308, in create_inference_containers
    self.create_inference_containers(child, layer_id=layer_id)
  File "venv/lib/python3.9/site-packages/deepspeed/runtime/hybrid_engine.py", line 288, in create_inference_containers
    self._inference_containers.append(self.inference_policies[child.__class__][0](
  File "venv/lib/python3.9/site-packages/deepspeed/runtime/hybrid_engine.py", line 111, in new_inference_container
    _container.set_params_wo_copy(Z3_enabled=self.Z3_enabled)
AttributeError: 'DS_BloomContainer' object has no attribute 'set_params_wo_copy'

To Reproduce the run.sh is:

nohup sh training_scripts/single_node/run_bloom_1b7.sh \
  bigscience/bloomz-1b7 \
  bigscience/bloomz-1b7 \
  2 \
  2 \
  output_single_node_bloomz1b7 >train_test_zero2.log 2>&1 &

the run_bloom_1b7.sh is:

#!/bin/bash
# Copyright (c) Microsoft Corporation.
# SPDX-License-Identifier: Apache-2.0

# DeepSpeed Team
ACTOR_MODEL_PATH=$1
CRITIC_MODEL_PATH=$2
ACTOR_ZERO_STAGE=${3:-2}
CRITIC_ZERO_STAGE=${4:-2}
OUTPUT=${5:-'./output'}
NUM_GPUS=${6:-8}
NUM_NODES=${7:-1}
mkdir -p $OUTPUT

Num_Padding_at_Beginning=0 # this is model related

Actor_Lr=9.65e-6
Critic_Lr=5e-6
hostname='localhost'

export NCCL_SOCKET_IFNAME=eth
export NCCL_DEBUG=INFO
export TOKENIZERS_PARALLELISM=false

deepspeed --master_port 25303 --master_addr ${hostname} --num_gpus ${NUM_GPUS} --num_nodes ${NUM_NODES} --hostfile 'deepspeed_hostfile' main.py \
  --data_path Dahoas/rm-static \
  --data_split 2,4,4 \
  --actor_model_name_or_path $ACTOR_MODEL_PATH \
  --critic_model_name_or_path $CRITIC_MODEL_PATH \
  --num_padding_at_beginning 1 \
  --per_device_train_batch_size 1 \
  --per_device_mini_train_batch_size 1 \
  --generation_batch_numbers 1 \
  --ppo_epochs 1 \
  --max_answer_seq_len 256 \
  --max_prompt_seq_len 256 \
  --actor_learning_rate ${Actor_Lr} \
  --critic_learning_rate ${Critic_Lr} \
  --disable_actor_dropout \
  --num_train_epochs 1 \
  --lr_scheduler_type cosine \
  --gradient_accumulation_steps 1 \
  --num_warmup_steps 100 \
  --deepspeed --seed 1234 \
  --enable_hybrid_engine \
  --inference_tp_size ${NUM_NODES} \
  --tp_gather_partition_size ${NUM_GPUS} \
  --actor_zero_stage $ACTOR_ZERO_STAGE \
  --critic_zero_stage $CRITIC_ZERO_STAGE \
  --actor_gradient_checkpointing \
  --critic_gradient_checkpointing \
  --output_dir $OUTPUT |&
  tee $OUTPUT/training.log

Expected behavior DS_BloomContainer has attribute 'set_params_wo_copy' and can use hybrid engine to train

ds_report output

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [YES] ...... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [YES] ...... [OKAY]
fused_adam ............. [YES] ...... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [YES] ...... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/local/venv/lib/python3.9/site-packages/torch']
torch version .................... 1.13.1+cu117
deepspeed install path ........... ['/usr/local/venv/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.9.3+194053b, 194053b, master
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.7
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.7

Screenshots no. The error is in the Log output

System info (please complete the following information):

OS: Linux version 4.18.0-240.el8.x86_64. CentOS Linux 7 (Core).
GPU count and types: one machine with x8 A100s each
Python version: 3.9.13

Docker context no

Additional context no

@cmikeh2 @jeffra @lekurile @awan-10

lekurile commented 1 year ago

Hi @shenzhuo,

Thank you for reporting this issue and providing details for reproducing it!

I've created a PR https://github.com/microsoft/DeepSpeed/pull/3580 in the DeepSpeed repo updating the BLOOM container to inherit the HybridEngineContainer feature and added a corresponding set_lora_params() function.

I've been able to test on my end and see the BLOOM container working now.

Could you please test on your end as well?

Thanks, Lev

shenzhuo commented 1 year ago

Hi @shenzhuo,

Thank you for reporting this issue and providing details for reproducing it!

I've created a PR #3580 in the DeepSpeed repo updating the BLOOM container to inherit the HybridEngineContainer feature and added a corresponding set_lora_params() function.

I've been able to test on my end and see the BLOOM container working now.

Could you please test on your end as well?

Thanks, Lev

Hi @lekurile ,

Thanks for the fix on this. I tried this.

First, the error is:

Traceback (most recent call last):
  File "DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 562, in <module>
    main()
  File "DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 471, in main
    out = trainer.generate_experience(prompts)
  File "DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 97, in generate_experience
    seq = self._generate_sequence(prompts)
  File "DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 73, in _generate_sequence
    seq = self.actor_model.module.generate(prompts,
  File "/dcv/lib/python3.9/site-packages/deepspeed/runtime/hybrid_engine.py", line 245, in generate
    generate_ret_vals = self._generate(*inputs, **kwargs)
  File "/dcv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/dcv/lib/python3.9/site-packages/transformers/generation/utils.py", line 1437, in generate
    return self.greedy_search(
  File "/dcv/lib/python3.9/site-packages/transformers/generation/utils.py", line 2248, in greedy_search
    outputs = self(
  File "/dcv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1208, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/dcv/lib/python3.9/site-packages/transformers/models/bloom/modeling_bloom.py", line 913, in forward
    transformer_outputs = self.transformer(
  File "/dcv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1208, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/dcv/lib/python3.9/site-packages/transformers/models/bloom/modeling_bloom.py", line 786, in forward
    outputs = block(
  File "/dcv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1208, in _call_impl
    result = forward_call(*input, **kwargs)
  File "/dcv/lib/python3.9/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 147, in forward
    self.attention(input,
  File "/dcv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/dcv/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/ds_attention.py", line 160, in forward
    context_layer, key_layer, value_layer = self.compute_attention(qkv_out=qkv_out,
  File "/dcv/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/ds_attention.py", line 253, in compute_attention
    attn_mask=((1 - input_mask).half() * minus_inf),
  File "/dcv/lib/python3.9/site-packages/torch/_tensor.py", line 39, in wrapped
    return f(*args, **kwargs)
  File "/dcv/lib/python3.9/site-packages/torch/_tensor.py", line 833, in __rsub__
    return _C._VariableFunctions.rsub(self, other)
RuntimeError: Subtraction, the `-` operator, with a bool tensor is not supported. If you are trying to invert a mask, use the `~` or `logical_not()` operator instead.

More details can be seen here: issue

So, I changed the source code of DS, actually changed DeepSpeed/deepspeed/ops/transformer/inference/ds_attention.py. This change is:

As a result, the above bug was solved.

Second, there is another error:

Traceback (most recent call last):
  File "DeepSpeedRLHF/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 678, in <module>
    main()
  File "DeepSpeedRLHF/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/main.py", line 502, in main
    out = trainer.generate_experience(batch_prompt['prompt'],
  File "DeepSpeedRLHF/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 107, in generate_experience
    seq = self._generate_sequence(prompts, mask)
  File "DeepSpeedRLHF/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 81, in _generate_sequence
    seq = self.actor_model.module.generate(input_ids=prompts,
  File "/dcv/lib/python3.9/site-packages/deepspeed/runtime/hybrid_engine.py", line 266, in generate
    generate_ret_vals = self._generate(*inputs, **kwargs)
  File "/dcv/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/dcv/lib/python3.9/site-packages/transformers/generation/utils.py", line 1607, in generate
    return self.beam_search(
  File "/dcv/lib/python3.9/site-packages/transformers/generation/utils.py", line 2905, in beam_search
    outputs = self(
  File "/dcv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/dcv/lib/python3.9/site-packages/transformers/models/bloom/modeling_bloom.py", line 913, in forward
    transformer_outputs = self.transformer(
  File "/dcv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/dcv/lib/python3.9/site-packages/transformers/models/bloom/modeling_bloom.py", line 786, in forward
    outputs = block(
  File "/dcv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/dcv/lib/python3.9/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 157, in forward
    self.attention(input,
  File "/dcv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/dcv/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/ds_attention.py", line 158, in forward
    context_layer, key_layer, value_layer = self.compute_attention(qkv_out=qkv_out,
  File "/dcv/lib/python3.9/site-packages/deepspeed/ops/transformer/inference/ds_attention.py", line 247, in compute_attention
    matmul_result = torch.matmul(query_layer, key_layer)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmStridedBatchedExFix(handle, opa, opb, (int)m, (int)n, (int)k, (void*)&falpha, a, CUDA_R_16BF, (int)lda, stridea, b, CUDA_R_16BF, (int)ldb, strideb, (void*)&fbeta, c, CUDA_R_16BF, (int)ldc, stridec, (int)num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`

I can't understand this error, this is extremely weird.

SabrinaZhuangxx commented 1 year ago

same error

lekurile commented 1 year ago

Hi @shenzhuo, @SabrinaZhuangxx,

After adding the changes in the following PRs:

I was able to get bigscience/bloomz-1b7 to train in DeepSpeed-Chat step 3, however, the critic model must be trained through step 2 of training first.

The command I used to run this looks as follow:

DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning$ bash training_scripts/bloom/single_node/run_bloom.sh bigscience/bloomz-1b7 ../step2_reward_model_finetuning/bloom_7b_output/ 2 2 output_bloom7b_actor_hf_critic_step2

Can you please try running again with the latest changes and instead of using bigscience/bloomz-1b7 for the critic model in step 3, please use a critic model trained through step 2 of DeepSpeed-Chat training.

Thanks, Lev

lekurile commented 11 months ago

Hi @shenzhuo,

Closing the issue for now since solution was provided. If any issues are still encountered, feel free to open another issue.

microsoft / DeepSpeed

[BUG] use bloomz + hybrid_engine, but AttributeError: 'DS_BloomContainer' object has no attribute 'set_params_wo_copy' #3518