[rank3]: ValueError: not enough values to unpack (expected 2, got 0)
[rank1]: Traceback (most recent call last):
[rank1]: File "/home/deepspeed/DeepSpeedExamples/applications/DeepSpeed-Chat/main_step3.py", line 673, in <module>
[rank1]: main()
[rank1]: File "/home/deepspeed/DeepSpeedExamples/applications/DeepSpeed-Chat/main_step3.py", line 527, in main
[rank1]: out = trainer.generate_experience(batch_prompt['prompt'],
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/deepspeed/DeepSpeedExamples/applications/DeepSpeed-Chat/dschat/rlhf/ppo_trainer.py", line 140, in generate_experience
[rank1]: seq = self._generate_sequence(prompts, mask, step)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/deepspeed/DeepSpeedExamples/applications/DeepSpeed-Chat/dschat/rlhf/ppo_trainer.py", line 87, in _generate_sequence
[rank1]: seq = self.actor_model.module.generate(
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/deepspeed/runtime/hybrid_engine.py", line 253, in generate
[rank1]: generate_ret_vals = self._generate(*inputs, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank1]: return func(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/transformers/generation/utils.py", line 2024, in generate
[rank1]: result = self._sample(
[rank1]: ^^^^^^^^^^^^^
[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/transformers/generation/utils.py", line 2982, in _sample
[rank1]: outputs = self(**model_inputs, return_dict=True)[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1609, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/transformers/models/bloom/modeling_bloom.py", line 955, in forward
[rank1]: transformer_outputs = self.transformer(
[rank1]: ^^^^^^^^^^^^^^^^^[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1609, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/transformers/models/bloom/modeling_bloom.py", line 744, in forward
[rank1]: outputs = block(
[rank1]: ^^^^^^
[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl[rank1]: return self._call_impl(*args, **kwargs)[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1609, in _call_impl[rank1]: result = forward_call(*args, **kwargs)[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py", line 171, in forward
[rank1]: self.attention(input,
[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1568, in _call_impl
[rank1]: return forward_call(*args, **kwargs)[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/deepspeed/ops/transformer/inference/ds_attention.py", line 160, in forward
[rank1]: context_layer, key_layer, value_layer = self.compute_attention(qkv_out=qkv_out,
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^[rank1]: File "/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/deepspeed/ops/transformer/inference/ds_attention.py", line 239, in compute_attention
[rank1]: past_key, past_value = layer_past
[rank1]: ^^^^^^^^^^^^^^^^^^^^
[rank1]: ValueError: not enough values to unpack (expected 2, got 0)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
To Reproduce
Steps to reproduce the behavior:
Command/Script to reproduce
What packages are required and their versions
How to run the script
...
Expected behavior
A clear and concise description of what you expected to happen.
ds_report output
ds_report
[2024-09-11 19:27:52,618] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fp_quantizer ........... [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
gds .................... [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.4
[WARNING] using untested triton version (3.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/torch']
torch version .................... 2.4.0+cu121
deepspeed install path ........... ['/home/tools/anaconda3/envs/deepspeed/lib/python3.12/site-packages/deepspeed']
deepspeed info ................... 0.15.1, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.1
deepspeed wheel compiled w. ...... torch 2.4, cuda 12.1
shared memory (/dev/shm) size .... 503.77 GB
Screenshots
If applicable, add screenshots to help explain your problem.
System info (please complete the following information):
- OS: Ubuntu 20.04.6 LTS
- GPU :NVIDIA L20*4 46G
- (if applicable) what [DeepSpeed-MII](https://github.com/microsoft/deepspeed-mii) 0.15.1
- (if applicable) Hugging Face Transformers/Accelerate/etc. versions 4.44.2
- Python 3.12.0
- transformers 4.44.2
- cuda 12.1
- torch 2.4.0
- deepspeed 0.15.1
- accelerate 0.33.0
- Any other relevant info about your setup
Docker context
Are you using a specific docker image that you can share?
Additional context
home/deepspeed/DeepSpeedExamples/applications/DeepSpeed-Chat/dschat/utils/model/model_utils.py:155: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
model_ckpt_state_dict = torch.load(model_ckpt_path, map_location='cpu')
Describe the bug when i run train,rlhf step 3;
Log output i got error:
To Reproduce Steps to reproduce the behavior:
Expected behavior A clear and concise description of what you expected to happen.
ds_report output
ds_report
Screenshots If applicable, add screenshots to help explain your problem.
System info (please complete the following information):
Docker context Are you using a specific docker image that you can share?
Additional context