hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
33.32k stars 4.1k forks source link

When using the Liger kernel, get an error: 'tensor' object has no attribute 'cast'. #5784

Open Tendo33 opened 1 week ago

Tendo33 commented 1 week ago

Reminder

System Info

[2024-10-23 02:13:28,666] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
 [WARNING]  using untested triton version (2.3.0), only 1.0.0 is known to be compatible

- `llamafactory` version: 0.9.1.dev0
- Platform: Linux-5.4.0-155-generic-x86_64-with-glibc2.35
- Python version: 3.10.13
- PyTorch version: 2.3.0+cu121 (GPU)
- Transformers version: 4.44.2
- Datasets version: 2.20.0
- Accelerate version: 0.34.2
- PEFT version: 0.11.1
- TRL version: 0.9.6
- GPU type: NVIDIA A800-SXM4-80GB
- DeepSpeed version: 0.14.4
- Bitsandbytes version: 0.44.1
- vLLM version: 0.5.1

Reproduction

### model
model_name_or_path:xxxxxxx

### method
stage: sft
do_train: true
finetuning_type: lora
lora_alpha: 64
lora_rank: 32
lora_target: all
lora_dropout: 0
#use_dora: true
#neftune_noise_alpha: 5
#use_unsloth: true
# use_unsloth_gc: true
# use_rslora: true
enable_liger_kernel: true

### dataset
dataset_dir: ../data
dataset: xxxx
template: qwen
cutoff_len: 5000
overwrite_cache: true
preprocessing_num_workers: 64

### output
output_dir: xxxxxxx
logging_steps: 1
save_steps: 2490
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 4
per_device_eval_batch_size: 4
gradient_accumulation_steps: 4
learning_rate: 5.0e-5
num_train_epochs: 3
lr_scheduler_type: cosine
warmup_ratio: 0.1
weight_decay: 0.01
flash_attn: auto
bf16: true
report_to: wandb
run_name: "xxxxxxx"

### eval
val_size: 0.05
eval_strategy: steps
eval_steps: 249
load_best_model_at_end: true
compute_accuracy: true

### additional settings
additional_target: embed_tokens,lm_head
resize_vocab: true

Expected behavior

报错信息:

Traceback (most recent call last):
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/compiler/code_generator.py", line 1222, in ast_to_ttir
    generator.visit(fn.parse())
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/compiler/code_generator.py", line 1105, in visit
    ret = super().visit(node)
  File "/root/miniconda3/envs/llm/lib/python3.10/ast.py", line 418, in visit
    return visitor(node)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/compiler/code_generator.py", line 303, in visit_Module
    ast.NodeVisitor.generic_visit(self, node)
  File "/root/miniconda3/envs/llm/lib/python3.10/ast.py", line 426, in generic_visit
    self.visit(item)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/compiler/code_generator.py", line 1105, in visit
    ret = super().visit(node)
  File "/root/miniconda3/envs/llm/lib/python3.10/ast.py", line 418, in visit
    return visitor(node)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/compiler/code_generator.py", line 376, in visit_FunctionDef
    self.visit_compound_statement(node.body)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/compiler/code_generator.py", line 298, in visit_compound_statement
    ret_type = self.visit(stmt)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/compiler/code_generator.py", line 1105, in visit
    ret = super().visit(node)
  File "/root/miniconda3/envs/llm/lib/python3.10/ast.py", line 418, in visit
    return visitor(node)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/compiler/code_generator.py", line 428, in visit_Assign
    values = self.visit(node.value)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/compiler/code_generator.py", line 1105, in visit
    ret = super().visit(node)
  File "/root/miniconda3/envs/llm/lib/python3.10/ast.py", line 418, in visit
    return visitor(node)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/compiler/code_generator.py", line 1008, in visit_Call
    fn = _unwrap_if_constexpr(self.visit(node.func))
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/compiler/code_generator.py", line 1105, in visit
    ret = super().visit(node)
  File "/root/miniconda3/envs/llm/lib/python3.10/ast.py", line 418, in visit
    return visitor(node)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/compiler/code_generator.py", line 1066, in visit_Attribute
    return getattr(lhs, node.attr)
AttributeError: 'tensor' object has no attribute 'cast'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/miniconda3/envs/llm/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
  File "/workspace/sunjinfeng/github_projet/LLaMA-Factory/src/llamafactory/cli.py", line 111, in main
    run_exp()
  File "/workspace/sunjinfeng/github_projet/LLaMA-Factory/src/llamafactory/train/tuner.py", line 50, in run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/workspace/sunjinfeng/github_projet/LLaMA-Factory/src/llamafactory/train/sft/workflow.py", line 96, in run_sft
    train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 1938, in train
    return inner_training_loop(
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 2279, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 3318, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 3363, in compute_loss
    outputs = model(**inputs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward
    return model_forward(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/peft/peft_model.py", line 1430, in forward
    return self.base_model(
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 179, in forward
    return self.model.forward(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/liger_kernel/transformers/model/qwen2.py", line 81, in lce_forward
    outputs = self.model(
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 904, in forward
    layer_outputs = self._gradient_checkpointing_func(
  File "/workspace/sunjinfeng/github_projet/LLaMA-Factory/src/llamafactory/model/model_utils/checkpointing.py", line 93, in custom_gradient_checkpointing_func
    return gradient_checkpointing_func(func, *args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
    return fn(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
    return fn(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 262, in forward
    outputs = run_function(*args)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 669, in forward
    hidden_states = self.mlp(hidden_states)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/liger_kernel/transformers/swiglu.py", line 21, in forward
    LigerSiLUMulFunction.apply(self.gate_proj(x), self.up_proj(x))
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/liger_kernel/ops/utils.py", line 30, in wrapper
    return fn(ctx, *args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/liger_kernel/ops/swiglu.py", line 111, in forward
    a, b, c = swiglu_forward(a, b)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/liger_kernel/ops/swiglu.py", line 74, in swiglu_forward
    _swiglu_forward_kernel[(n_rows,)](
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/runtime/jit.py", line 167, in <lambda>
    return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/runtime/jit.py", line 416, in run
    self.cache[device][key] = compile(
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/compiler/compiler.py", line 191, in compile
    module = src.make_ir(options)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/compiler/compiler.py", line 117, in make_ir
    return ast_to_ttir(self.fn, self, options=options)
  File "/root/miniconda3/envs/llm/lib/python3.10/site-packages/triton/compiler/code_generator.py", line 1231, in ast_to_ttir
    raise CompilationError(fn.src, node, repr(e)) from e
triton.compiler.errors.CompilationError: at 4:31:def _swiglu_forward_kernel(
    a_ptr, b_ptr, c_ptr, stride, n_cols: tl.constexpr, BLOCK_SIZE: tl.constexpr
):
    program_id = tl.program_id(0).cast(tl.int64)
                               ^
AttributeError("'tensor' object has no attribute 'cast'")

关掉enable_liger_kernel: true之后可以正常训练

Others

No response

tunachiu commented 1 week ago

I met the same errors and fixed it by changing torch version to 2.4.0

idontkonwher commented 1 week ago

same issue, Liger-Kernel required torch only >= 2.1, how come it need torch version 2.4.0

Tendo33 commented 1 week ago

I met the same errors and fixed it by changing torch version to 2.4.0

I think it's an issue with Liger-Kernel. I started getting the following error after updating torch to 2.4.0:

RuntimeError: mat1 and mat2 must have the same dtype, but got BFloat16 and Float in fused_linear_cross_entropy_forward

Then I found this:https://github.com/linkedin/Liger-Kernel/issues/235

YingxuanW commented 1 day ago

@Tendo33 Hi!Did you fix this problem? I also met this problem!

Tendo33 commented 1 day ago

@Tendo33 Hi!Did you fix this problem? I also met this problem!

Nop, the solution I found requires modifying the Liger-Kernel source code. I'm waiting for them to fix the bug 🤣