LLaMA2-7b error:AssertionError: {'id': 292, 'status': 'NOT_AVAILABLE', 'numel': 0, 'ds_numel': 0, 'shape': (0,), 'ds_shape': (0, 4096), 'requires_grad': True, 'grad_shape': None, 'persist': True, 'active_sub_modules': {453}, 'ds_tensor.shape': torch.Size([0])}

BUAAZYF commented 1 year ago

Running training Evaluating perplexity, Epoch 0/1 ╭─────────────────────────── Traceback (most recent call last) ───────────────────────────╮ │ /DeepSpeed-Chat/training/step1_ │ │ supervisedfinetuning/main.py:352 in │ │ │ │ 349 │ │ 350 │ │ 351 if name == "main": │ │ ❱ 352 │ main() │ │ 353 │ │ │ │ /DeepSpeed-Chat/training/step1 │ │ supervised_finetuning/main.py:309 in main │ │ │ │ 306 │ print_rank_0( │ │ 307 │ │ f" Evaluating perplexity, Epoch {0}/{args.num_train_epochs} ", │ │ 308 │ │ args.global_rank) │ │ ❱ 309 │ perplexity = evaluation(model, eval_dataloader) │ │ 310 │ print_rank_0(f"ppl: {perplexity}", args.global_rank) │ │ 311 │ │ │ 312 │ for epoch in range(args.num_trainepochs): │ │ │ │ /DeepSpeed-Chat/training/step1 │ │ supervised_finetuning/main.py:260 in evaluation │ │ │ │ 257 │ │ for step, batch in enumerate(eval_dataloader): │ │ 258 │ │ │ batch = to_device(batch, device) │ │ 259 │ │ │ with torch.no_grad(): │ │ ❱ 260 │ │ │ │ outputs = model(batch) │ │ 261 │ │ │ │ │ 262 │ │ │ loss = outputs.loss │ │ 263 │ │ │ losses += loss.float() │ │ │ │ /home/ai_group/anaconda3/envs/ds/lib/python3.9/site-packages/torch/nn/modules/module.py │ │ :1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, *kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/ai_group/anaconda3/envs/ds/lib/python3.9/site-packages/deepspeed/utils/nvtx.py:15 │ │ in wrapped_fn │ │ │ │ 12 │ │ │ 13 │ def wrapped_fn(args, kwargs): │ │ 14 │ │ get_accelerator().range_push(func.qualname) │ │ ❱ 15 │ │ ret_val = func(*args, kwargs) │ │ 16 │ │ get_accelerator().range_pop() │ │ 17 │ │ return ret_val │ │ 18 │ │ │ │ /home/ai_group/anaconda3/envs/ds/lib/python3.9/site-packages/deepspeed/runtime/engine.p │ │ y:1768 in forward │ │ │ │ 1765 │ │ if self.fp16_auto_cast(): │ │ 1766 │ │ │ inputs = self._cast_inputs_half(inputs) │ │ 1767 │ │ │ │ ❱ 1768 │ │ loss = self.module(*inputs, *kwargs) │ │ 1769 │ │ │ │ 1770 │ │ if self.zero_optimization_partition_weights(): │ │ 1771 │ │ │ # Disable automated discovery of external parameters │ │ │ │ /home/ai_group/anaconda3/envs/ds/lib/python3.9/site-packages/torch/nn/modules/module.py │ │ :1212 in _call_impl │ │ │ │ 1209 │ │ │ bw_hook = hooks.BackwardHook(self, full_backward_hooks) │ │ 1210 │ │ │ input = bw_hook.setup_input_hook(input) │ │ 1211 │ │ │ │ ❱ 1212 │ │ result = forward_call(input, kwargs) │ │ 1213 │ │ if _global_forward_hooks or self._forward_hooks: │ │ 1214 │ │ │ for hook in (_global_forward_hooks.values(), self._forward_hooks.v │ │ 1215 │ │ │ │ hook_result = hook(self, input, result) │ │ │ │ /home/ai_group/anaconda3/envs/ds/lib/python3.9/site-packages/transformers/models/llama/ │ │ modeling_llama.py:827 in forward │ │ │ │ 824 │ │ │ logits = [F.linear(hidden_states, lm_head_slices[i]) for i in range( │ │ 825 │ │ │ logits = torch.cat(logits, dim=-1) │ │ 826 │ │ else: │ │ ❱ 827 │ │ │ logits = self.lm_head(hidden_states) │ │ 828 │ │ logits = logits.float() │ │ 829 │ │ │ │ 830 │ │ loss = None │ │ │ │ /home/ai_group/anaconda3/envs/ds/lib/python3.9/site-packages/torch/nn/modules/module.py │ │ :1201 in _call_impl │ │ │ │ 1198 │ │ │ full_backward_hooks, non_full_backward_hooks = self._get_backward_ho │ │ 1199 │ │ if _global_forward_pre_hooks or self._forward_pre_hooks: │ │ 1200 │ │ │ for hook in (_global_forward_pre_hooks.values(), self._forward_pre │ │ ❱ 1201 │ │ │ │ result = hook(self, input) │ │ 1202 │ │ │ │ if result is not None: │ │ 1203 │ │ │ │ │ if not isinstance(result, tuple): │ │ 1204 │ │ │ │ │ │ result = (result,) │ │ │ │ /home/ai_group/anaconda3/envs/ds/lib/python3.9/site-packages/deepspeed/utils/nvtx.py:15 │ │ in wrapped_fn │ │ │ │ 12 │ │ │ 13 │ def wrapped_fn(*args, kwargs): │ │ 14 │ │ get_accelerator().range_push(func.qualname) │ │ ❱ 15 │ │ ret_val = func(*args, kwargs) │ │ 16 │ │ get_accelerator().range_pop() │ │ 17 │ │ return ret_val │ │ 18 │ │ │ │ /home/ai_group/anaconda3/envs/ds/lib/python3.9/site-packages/deepspeed/runtime/zero/par │ │ ameter_offload.py:383 in _pre_forward_module_hook │ │ │ │ 380 │ │ │ │ 381 │ │ @instrument_w_nvtx │ │ 382 │ │ def _pre_forward_module_hook(module, args): │ │ ❱ 383 │ │ │ self.pre_sub_module_forward_function(module) │ │ 384 │ │ │ │ 385 │ │ @instrument_w_nvtx │ │ 386 │ │ def _post_forward_module_hook(module, input, output): │ │ │ │ /home/ai_group/anaconda3/envs/ds/lib/python3.9/site-packages/torch/autograd/grad_mode.p │ │ y:27 in decorate_context │ │ │ │ 24 │ │ @functools.wraps(func) │ │ 25 │ │ def decorate_context(args, kwargs): │ │ 26 │ │ │ with self.clone(): │ │ ❱ 27 │ │ │ │ return func(*args, kwargs) │ │ 28 │ │ return cast(F, decorate_context) │ │ 29 │ │ │ 30 │ def _wrap_generator(self, func): │ │ │ │ /home/ai_group/anaconda3/envs/ds/lib/python3.9/site-packages/deepspeed/runtime/zero/par │ │ ameter_offload.py:495 in pre_sub_module_forward_function │ │ │ │ 492 │ │ param_coordinator.trace_prologue(sub_module) │ │ 493 │ │ if param_coordinator.is_record_trace(): │ │ 494 │ │ │ param_coordinator.record_module(sub_module) │ │ ❱ 495 │ │ param_coordinator.fetch_sub_module(sub_module, forward=True) │ │ 496 │ │ │ │ 497 │ │ see_memory_usage(f"Before sub module function {sub_module.class.nam │ │ 498 │ │ │ │ /home/ai_group/anaconda3/envs/ds/lib/python3.9/site-packages/deepspeed/utils/nvtx.py:15 │ │ in wrapped_fn │ │ │ │ 12 │ │ │ 13 │ def wrapped_fn(*args, kwargs): │ │ 14 │ │ get_accelerator().range_push(func.qualname) │ │ ❱ 15 │ │ ret_val = func(*args, *kwargs) │ │ 16 │ │ get_accelerator().range_pop() │ │ 17 │ │ return ret_val │ │ 18 │ │ │ │ /home/ai_group/anaconda3/envs/ds/lib/python3.9/site-packages/torch/autograd/grad_mode.p │ │ y:27 in decorate_context │ │ │ │ 24 │ │ @functools.wraps(func) │ │ 25 │ │ def decorate_context(args, kwargs): │ │ 26 │ │ │ with self.clone(): │ │ ❱ 27 │ │ │ │ return func(*args, **kwargs) │ │ 28 │ │ return cast(F, decorate_context) │ │ 29 │ │ │ 30 │ def _wrap_generator(self, func): │ │ │ │ /home/ai_group/anaconda3/envs/ds/lib/python3.9/site-packages/deepspeed/runtime/zero/par │ │ titioned_param_coordinator.py:306 in fetch_sub_module │ │ │ │ 303 │ │ │ │ │ │ event.record() │ │ 304 │ │ │ │ │ │ self.ongoing_fetch_events.append(event) │ │ 305 │ │ │ │ │ ❱ 306 │ │ │ assert param.ds_status == ZeroParamStatus.AVAILABLE, param.ds_summary │ │ 307 │ │ if not get_accelerator().is_synchronized_device(): │ │ 308 │ │ │ get_accelerator().current_stream().wait_stream(self.allgather_strea │ │ 309 │ │ self.profiler.stop_event(wait_event_name, wait_numel) │ ╰─────────────────────────────────────────────────────────────────────────────────────────╯ AssertionError: {'id': 292, 'status': 'NOT_AVAILABLE', 'numel': 0, 'ds_numel': 0, 'shape': (0,), 'ds_shape': (0, 4096), 'requires_grad': True, 'grad_shape': None, 'persist': True, 'active_sub_modules': {453}, 'ds_tensor.shape': torch.Size([0])}

ds_report [2023-08-29 03:26:58,579] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja ninja .................. [OKAY]

op name ................ installed .. compatible

async_io ............... [NO] ....... [OKAY] fused_adam ............. [NO] ....... [OKAY] cpu_adam ............... [NO] ....... [OKAY] cpu_adagrad ............ [NO] ....... [OKAY] fused_lamb ............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] random_ltd ............. [NO] ....... [OKAY] [WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible sparse_attn ............ [NO] ....... [NO] spatial_inference ...... [NO] ....... [OKAY] transformer ............ [NO] ....... [OKAY] stochastic_transformer . [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY]

DeepSpeed general environment info: torch install path ............... ['/home/ai_group/anaconda3/envs/ds/lib/python3.9/site-packages/torch'] torch version .................... 1.13.1+cu117 deepspeed install path ........... ['/home/ai_group/anaconda3/envs/ds/lib/python3.9/site-packages/deepspeed'] deepspeed info ................... 0.10.1, unknown, unknown torch cuda version ............... 11.7 torch hip version ................ None nvcc version ..................... 12.2 deepspeed wheel compiled w. ...... torch 2.0, cuda 11.7 shared memory (/dev/shm) size .... 125.60 GB

yorhaha commented 11 months ago

The same error with you while using zero-3.

tjruwase commented 11 months ago

@yorhaha, we traced this to an issue in transformers. Can you please try the transformers master branch? Thanks!

yorhaha commented 11 months ago

It works! Thanks @tjruwase !

FLoutione commented 11 months ago

Do you have any news? I also encountered this issue with my multi card training Llama2-7b.@tjruwase

tjruwase commented 11 months ago

@FLoutione, as stated earlier this was actually an issue in transformers that has been fixed in the latest version. Can you please update your transformers to check if this is still a problem?

FLoutione commented 11 months ago

@tjruwase I found that this issue no longer occurs when the version is downgraded to 4.31.0, but it exists in version 4.33.0 dev.

tjruwase commented 11 months ago

@FLoutione, can you please try latest transformers https://github.com/huggingface/transformers/releases/tag/v4.33.2

microsoft / DeepSpeedExamples