unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
17.7k stars 1.23k forks source link

NotImplementedError #747

Open Keramatfar opened 3 months ago

Keramatfar commented 3 months ago

I want to fine-tune a model using unsloth. Every thing works fine on colab but on my system I got the following: { "name": "NotImplementedError", "message": "No operator found for memory_efficient_attention_forward with inputs: query : shape=(1, 634, 10, 4, 128) (torch.bfloat16) key : shape=(1, 634, 10, 4, 128) (torch.bfloat16) value : shape=(1, 634, 10, 4, 128) (torch.bfloat16) attn_bias : <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'> p : 0.0 flshattF@0.0.0 is not supported because: xFormers wasn't build with CUDA support cutlassF is not supported because: xFormers wasn't build with CUDA support operator wasn't built - see python -m xformers.info for more info smallkF is not supported because: max(query.shape[-1] != value.shape[-1]) > 32 xFormers wasn't build with CUDA support dtype=torch.bfloat16 (supported: {torch.float32}) attn_bias type is <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'> operator wasn't built - see python -m xformers.info for more info operator does not support BMGHK format unsupported embed per head: 128", "stack": "--------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) Cell In[10], line 1 ----> 1 trainer_stats = trainer.train()

File :124, in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)

File :356, in _fast_inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/transformers/trainer.py:3307, in Trainer.training_step(self, model, inputs) 3304 return loss_mb.reduce_mean().detach().to(self.args.device) 3306 with self.compute_loss_context_manager(): -> 3307 loss = self.compute_loss(model, inputs) 3309 del inputs 3311 kwargs = {}

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/transformers/trainer.py:3338, in Trainer.compute_loss(self, model, inputs, return_outputs) 3336 else: 3337 labels = None -> 3338 outputs = model(**inputs) 3339 # Save past state if it exists 3340 # TODO: this needs to be fixed and made cleaner later. 3341 if self.args.past_index >= 0:

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._wrapped_call_impl(self, *args, kwargs) 1525 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1526 else: -> 1527 return self._call_impl(args, kwargs)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/torch/nn/modules/module.py:1536, in Module._call_impl(self, *args, *kwargs) 1531 # If we don't have any hooks, we want to skip the rest of the logic in 1532 # this function, and just call forward. 1533 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1534 or _global_backward_pre_hooks or _global_backward_hooks 1535 or _global_forward_hooks or _global_forward_pre_hooks): -> 1536 return forward_call(args, **kwargs) 1538 try: 1539 result = None

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/accelerate/utils/operations.py:822, in convert_outputs_to_fp32..forward(*args, kwargs) 821 def forward(*args, *kwargs): --> 822 return model_forward(args, kwargs)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/accelerate/utils/operations.py:810, in ConvertOutputsToFp32.call(self, *args, kwargs) 809 def call(self, *args, *kwargs): --> 810 return convert_to_fp32(self.model_forward(args, kwargs))

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/torch/amp/autocast_mode.py:16, in autocast_decorator..decorate_autocast(*args, kwargs) 13 @functools.wraps(func) 14 def decorate_autocast(*args, *kwargs): 15 with autocast_instance: ---> 16 return func(args, kwargs)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/accelerate/utils/operations.py:822, in convert_outputs_to_fp32..forward(*args, kwargs) 821 def forward(*args, *kwargs): --> 822 return model_forward(args, kwargs)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/accelerate/utils/operations.py:810, in ConvertOutputsToFp32.call(self, *args, kwargs) 809 def call(self, *args, *kwargs): --> 810 return convert_to_fp32(self.model_forward(args, kwargs))

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/torch/amp/autocast_mode.py:16, in autocast_decorator..decorate_autocast(*args, kwargs) 13 @functools.wraps(func) 14 def decorate_autocast(*args, *kwargs): 15 with autocast_instance: ---> 16 return func(args, kwargs)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/unsloth/models/llama.py:940, in PeftModelForCausalLM_fast_forward(self, input_ids, causal_mask, attention_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict, task_ids, kwargs) 927 def PeftModelForCausalLM_fast_forward( 928 self, 929 input_ids=None, (...) 938 kwargs, 939 ): --> 940 return self.base_model( 941 input_ids=input_ids, 942 causal_mask=causal_mask, 943 attention_mask=attention_mask, 944 inputs_embeds=inputs_embeds, 945 labels=labels, 946 output_attentions=output_attentions, 947 output_hidden_states=output_hidden_states, 948 return_dict=return_dict, 949 **kwargs, 950 )

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._wrapped_call_impl(self, *args, kwargs) 1525 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1526 else: -> 1527 return self._call_impl(args, kwargs)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/torch/nn/modules/module.py:1536, in Module._call_impl(self, *args, *kwargs) 1531 # If we don't have any hooks, we want to skip the rest of the logic in 1532 # this function, and just call forward. 1533 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1534 or _global_backward_pre_hooks or _global_backward_hooks 1535 or _global_forward_hooks or _global_forward_pre_hooks): -> 1536 return forward_call(args, **kwargs) 1538 try: 1539 result = None

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/peft/tuners/tuners_utils.py:179, in BaseTuner.forward(self, *args, kwargs) 178 def forward(self, *args: Any, *kwargs: Any): --> 179 return self.model.forward(args, kwargs)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/accelerate/hooks.py:166, in add_hook_to_module..new_forward(module, *args, kwargs) 164 output = module._old_forward(*args, *kwargs) 165 else: --> 166 output = module._old_forward(args, kwargs) 167 return module._hf_hook.post_forward(module, output)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/unsloth/models/mistral.py:216, in MistralForCausalLM_fast_forward(self, input_ids, causal_mask, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict, *args, **kwargs) 208 outputs = LlamaModel_fast_forward_inference( 209 self, 210 input_ids, (...) 213 attention_mask = attention_mask, 214 ) 215 else: --> 216 outputs = self.model( 217 input_ids=input_ids, 218 causal_mask=causal_mask, 219 attention_mask=attention_mask, 220 position_ids=position_ids, 221 past_key_values=past_key_values, 222 inputs_embeds=inputs_embeds, 223 use_cache=use_cache, 224 output_attentions=output_attentions, 225 output_hidden_states=output_hidden_states, 226 return_dict=return_dict, 227 ) 228 pass 230 hidden_states = outputs[0]

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._wrapped_call_impl(self, *args, kwargs) 1525 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1526 else: -> 1527 return self._call_impl(args, kwargs)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/torch/nn/modules/module.py:1536, in Module._call_impl(self, *args, *kwargs) 1531 # If we don't have any hooks, we want to skip the rest of the logic in 1532 # this function, and just call forward. 1533 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1534 or _global_backward_pre_hooks or _global_backward_hooks 1535 or _global_forward_hooks or _global_forward_pre_hooks): -> 1536 return forward_call(args, **kwargs) 1538 try: 1539 result = None

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/accelerate/hooks.py:166, in add_hook_to_module..new_forward(module, *args, kwargs) 164 output = module._old_forward(*args, *kwargs) 165 else: --> 166 output = module._old_forward(args, kwargs) 167 return module._hf_hook.post_forward(module, output)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/unsloth/models/llama.py:696, in LlamaModel_fast_forward(self, input_ids, causal_mask, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict, *args, **kwargs) 693 if IS_GEMMA2: mask = self.SWA_mask if (idx % 2 == 0) else self.GA_mask 695 if offloaded_gradient_checkpointing: --> 696 hidden_states = Unsloth_Offloaded_Gradient_Checkpointer.apply( 697 decoder_layer, 698 hidden_states, 699 mask, 700 attention_mask, 701 position_ids, 702 past_key_values, 703 output_attentions, 704 use_cache, 705 )[0] 707 elif gradient_checkpointing: 708 def create_custom_forward(module):

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/torch/autograd/function.py:573, in Function.apply(cls, *args, *kwargs) 570 if not torch._C._are_functorch_transforms_active(): 571 # See NOTE: [functorch vjp and autograd interaction] 572 args = _functorch.utils.unwrap_dead_wrappers(args) --> 573 return super().apply(args, **kwargs) # type: ignore[misc] 575 if not is_setup_ctx_defined: 576 raise RuntimeError( 577 \"In order to use an autograd.Function with functorch transforms \" 578 \"(vmap, grad, jvp, jacrev, ...), it must override the setup_context \" 579 \"staticmethod. For more details, please see \" 580 \"https://pytorch.org/docs/master/notes/extending.func.html\" 581 )

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py:115, in custom_fwd..decorate_fwd(*args, *kwargs) 113 if cast_inputs is None: 114 args[0]._fwd_used_autocast = torch.is_autocast_enabled() --> 115 return fwd(args, **kwargs) 116 else: 117 autocast_context = torch.is_autocast_enabled()

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/unsloth/models/_utils.py:506, in Unsloth_Offloaded_Gradient_Checkpointer.forward(ctx, forward_function, hidden_states, args) 504 saved_hidden_states = hidden_states.to(\"cpu\", non_blocking = True) 505 with torch.no_grad(): --> 506 output = forward_function(hidden_states, args) 507 ctx.save_for_backward(saved_hidden_states) 508 ctx.forward_function = forward_function

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._wrapped_call_impl(self, *args, kwargs) 1525 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1526 else: -> 1527 return self._call_impl(args, kwargs)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/torch/nn/modules/module.py:1536, in Module._call_impl(self, *args, *kwargs) 1531 # If we don't have any hooks, we want to skip the rest of the logic in 1532 # this function, and just call forward. 1533 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1534 or _global_backward_pre_hooks or _global_backward_hooks 1535 or _global_forward_hooks or _global_forward_pre_hooks): -> 1536 return forward_call(args, **kwargs) 1538 try: 1539 result = None

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/accelerate/hooks.py:166, in add_hook_to_module..new_forward(module, *args, kwargs) 164 output = module._old_forward(*args, *kwargs) 165 else: --> 166 output = module._old_forward(args, kwargs) 167 return module._hf_hook.post_forward(module, output)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/unsloth/models/llama.py:454, in LlamaDecoderLayer_fast_forward(self, hidden_states, causal_mask, attention_mask, position_ids, past_key_value, output_attentions, use_cache, padding_mask, *args, **kwargs) 452 residual = hidden_states 453 hidden_states = fast_rms_layernorm(self.input_layernorm, hidden_states) --> 454 hidden_states, self_attn_weights, present_key_value = self.self_attn( 455 hidden_states=hidden_states, 456 causal_mask=causal_mask, 457 attention_mask=attention_mask, 458 position_ids=position_ids, 459 past_key_value=past_key_value, 460 output_attentions=output_attentions, 461 use_cache=use_cache, 462 padding_mask=padding_mask, 463 ) 464 hidden_states = residual + hidden_states 466 # Fully Connected

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/torch/nn/modules/module.py:1527, in Module._wrapped_call_impl(self, *args, kwargs) 1525 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1526 else: -> 1527 return self._call_impl(args, kwargs)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/torch/nn/modules/module.py:1536, in Module._call_impl(self, *args, *kwargs) 1531 # If we don't have any hooks, we want to skip the rest of the logic in 1532 # this function, and just call forward. 1533 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1534 or _global_backward_pre_hooks or _global_backward_hooks 1535 or _global_forward_hooks or _global_forward_pre_hooks): -> 1536 return forward_call(args, **kwargs) 1538 try: 1539 result = None

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/accelerate/hooks.py:166, in add_hook_to_module..new_forward(module, *args, kwargs) 164 output = module._old_forward(*args, *kwargs) 165 else: --> 166 output = module._old_forward(args, kwargs) 167 return module._hf_hook.post_forward(module, output)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/unsloth/models/mistral.py:132, in MistralAttention_fast_forward(self, hidden_states, causal_mask, attention_mask, position_ids, past_key_value, output_attentions, use_cache, padding_mask, *args, **kwargs) 129 pass 130 pass --> 132 A = xformers_attention(Q, K, V, attn_bias = causal_mask) 133 A = A.view(bsz, q_len, n_heads, head_dim) 135 elif HAS_FLASH_ATTENTION and attention_mask is None:

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/xformers/ops/fmha/init.py:268, in memory_efficient_attention(query, key, value, attn_bias, p, scale, op, output_dtype) 156 def memory_efficient_attention( 157 query: torch.Tensor, 158 key: torch.Tensor, (...) 165 outputdtype: Optional[torch.dtype] = None, 166 ) -> torch.Tensor: 167 \"\"\"Implements the memory-efficient attention mechanism following 168 \"Self-Attention Does Not Need O(n^2) Memory\" <http://arxiv.org/abs/2112.05682>. 169 (...) 266 :return: multi-head attention Tensor with shape [B, Mq, H, Kv] 267 \"\"\" --> 268 return _memory_efficient_attention( 269 Inputs( 270 query=query, 271 key=key, 272 value=value, 273 p=p, 274 attn_bias=attn_bias, 275 scale=scale, 276 output_dtype=output_dtype, 277 ), 278 op=op, 279 )

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/xformers/ops/fmha/init.py:387, in _memory_efficient_attention(inp, op) 382 def _memory_efficient_attention( 383 inp: Inputs, op: Optional[AttentionOp] = None 384 ) -> torch.Tensor: 385 # fast-path that doesn't require computing the logsumexp for backward computation 386 if all(x.requires_grad is False for x in [inp.query, inp.key, inp.value]): --> 387 return _memory_efficient_attention_forward( 388 inp, op=op[0] if op is not None else None 389 ) 391 output_shape = inp.normalize_bmhk() 392 return _fMHA.apply( 393 op, inp.query, inp.key, inp.value, inp.attn_bias, inp.p, inp.scale 394 ).reshape(output_shape)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/xformers/ops/fmha/init.py:403, in _memory_efficient_attention_forward(inp, op) 401 output_shape = inp.normalize_bmhk() 402 if op is None: --> 403 op = _dispatch_fw(inp, False) 404 else: 405 _ensure_op_supports_or_raise(ValueError, \"memory_efficient_attention\", op, inp)

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py:125, in _dispatch_fw(inp, needs_gradient) 116 def _dispatch_fw(inp: Inputs, needs_gradient: bool) -> Type[AttentionFwOpBase]: 117 \"\"\"Computes the best operator for forward 118 119 Raises: (...) 123 AttentionOp: The best operator for the configuration 124 \"\"\" --> 125 return _run_priority_list( 126 \"memory_efficient_attention_forward\", 127 _dispatch_fw_priority_list(inp, needs_gradient), 128 inp, 129 )

File ~/Desktop/venvs/offensive/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py:65, in _run_priority_list(name, priority_list, inp) 63 for op, not_supported in zip(priority_list, not_supported_reasons): 64 msg += \"\ \" + _format_not_supported_reasons(op, not_supported) ---> 65 raise NotImplementedError(msg)

NotImplementedError: No operator found for memory_efficient_attention_forward with inputs: query : shape=(1, 634, 10, 4, 128) (torch.bfloat16) key : shape=(1, 634, 10, 4, 128) (torch.bfloat16) value : shape=(1, 634, 10, 4, 128) (torch.bfloat16) attn_bias : <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'> p : 0.0 flshattF@0.0.0 is not supported because: xFormers wasn't build with CUDA support cutlassF is not supported because: xFormers wasn't build with CUDA support operator wasn't built - see python -m xformers.info for more info smallkF is not supported because: max(query.shape[-1] != value.shape[-1]) > 32 xFormers wasn't build with CUDA support dtype=torch.bfloat16 (supported: {torch.float32}) attn_bias type is <class 'xformers.ops.fmha.attn_bias.LowerTriangularMask'> operator wasn't built - see python -m xformers.info for more info operator does not support BMGHK format unsupported embed per head: 128" }

danielhanchen commented 3 months ago

Your xformers installation isn't correct - your first step is to get xformers to install correctly - then try python -m xformers.info to investigate if it installed correctly.

Once that works, then Unsloth can be installed