linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training
https://arxiv.org/pdf/2410.10989
BSD 2-Clause "Simplified" License
3.43k stars 202 forks source link

AttributeError: 'Qwen2RMSNorm' object has no attribute 'in_place' #390

Open jdf-prog opened 3 hours ago

jdf-prog commented 3 hours ago

🐛 Describe the bug

[rank0]: Traceback (most recent call last):                                                                                                                                                                          
[rank0]:   File "/home/dongfu/Workspace/Mantis/mantis/train/train_qwen2_vl.py", line 257, in <module>                                                                                                               
[rank0]:     main(training_args, data_args, model_args)                                                                                                                                                              
[rank0]:   File "/home/dongfu/Workspace/Mantis/mantis/train/train_qwen2_vl.py", line 231, in main                                                                                                                   
[rank0]:     trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)                                                                                                                              
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/transformers/trainer.py", line 2114, in train                                                                                     
[rank0]:     return inner_training_loop(                                                                                                                                                                             
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/transformers/trainer.py", line 2275, in _inner_training_loop                                                                      
[rank0]:     model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)                                                                                                                            
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/accelerate/accelerator.py", line 1323, in prepare                                                                                 
[rank0]:     result = self._prepare_deepspeed(*args)                                                                                                                                                                 
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/accelerate/accelerator.py", line 1842, in _prepare_deepspeed                                                                      
[rank0]:     engine, optimizer, _, lr_scheduler = ds_initialize(**kwargs)                                                                                                                                            
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/deepspeed/__init__.py", line 193, in initialize                                                                                   
[rank0]:     engine = DeepSpeedEngine(args=args,                                                                                                                                                                     
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 313, in __init__                                                                               
[rank0]:     self._configure_optimizer(optimizer, model_parameters)                                                                                                                                                  
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1302, in _configure_optimizer                                                                  
[rank0]:     self.optimizer = self._configure_zero_optimizer(basic_optimizer)                                                                                                                                        
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1626, in _configure_zero_optimizer                                                             
[rank0]:     optimizer = DeepSpeedZeroOptimizer_Stage3(                                                                                                                                                              
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/deepspeed/runtime/zero/stage3.py", line 163, in __init__                                                                          
[rank0]:     print_rank_0(f"initialized {__class__.__name__} with args: {locals()}", force=False)                                                                                                                    
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2943, in __repr__                                                                               
[rank0]:     mod_str = repr(module)                                                                                                                                                                                  
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2943, in __repr__                                                                               
[rank0]:     mod_str = repr(module)                                                                                                                                                                                  
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/torch/nn/modules/container.py", line 369, in __repr__                                                                             
[rank0]:     list_of_reprs = [repr(item) for item in self]                                                                                                                                                           
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/torch/nn/modules/container.py", line 369, in <listcomp>                                                                           
[rank0]:     list_of_reprs = [repr(item) for item in self]                                                                                                                                                           
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2943, in __repr__                                                                               
[rank0]:     mod_str = repr(module)                                                                                                                                                                                  
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2937, in __repr__                                                                               
[rank0]:     extra_repr = self.extra_repr()                                                                                                                                                                          
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/liger_kernel/transformers/rms_norm.py", line 43, in extra_repr                                                                    
[rank0]:     return f"{tuple(self.weight.shape)}, eps={self.variance_epsilon}, offset={self.offset}, in_place={self.in_place}"                                                                                       
[rank0]:   File "/home/dongfu/miniconda3/envs/mantis/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1931, in __getattr__                                                                            
[rank0]:     raise AttributeError(                                                                                                                                                                                   
[rank0]: AttributeError: 'Qwen2RMSNorm' object has no attribute 'in_place'

It seems the applying liger to qwen2vl will cause this bug.

Reproduce

No response

Versions

Environment Report:

Operating System: Linux-5.15.0-60-generic-x86_64-with-glibc2.31 Python version: 3.10.15 PyTorch version: 2.5.1+cu124 CUDA version: 12.4 Triton version: 3.1.0 Transformers version: 4.46.2

jdf-prog commented 3 hours ago

It seems to be that my custom qwen2vl are not applied to to the liger kernel.

def apply_liger_kernel_to_qwen2_vl(
    cross_entropy: bool = False,
    fused_linear_cross_entropy: bool = True,
    rms_norm: bool = True,
    layer_norm: bool = True,
    swiglu: bool = True,
    model: PreTrainedModel = None,
) -> None:
    """
    Apply Liger kernels to replace original implementation in HuggingFace Qwen2-VL models.
    NOTE: Qwen2-VL is not available in transformers<4.45.0

    Args:
        cross_entropy (bool): Whether to apply Liger's cross entropy loss. Default is False.
        fused_linear_cross_entropy (bool):
            Whether to apply Liger's fused linear cross entropy loss. Default is True.
            `cross_entropy` and `fused_linear_cross_entropy` cannot both be True.
            If `fused_linear_cross_entropy` is True, the logits will not be materialized but more memory efficient.
        rms_norm (bool): Whether to apply Liger's RMSNorm. Default is True.
        layer_norm (bool): Whether to apply Liger's LayerNorm. Default is True.
        swiglu (bool): Whether to apply Liger's SwiGLU MLP. Default is True.
        model (PreTrainedModel): The model instance to apply Liger kernels to, if the model has already been
        loaded. Default is None.
    """
    assert not (
        cross_entropy and fused_linear_cross_entropy
    ), "cross_entropy and fused_linear_cross_entropy cannot both be True."

    # from transformers.models.qwen2_vl import modeling_qwen2_vl
    # from transformers.models.qwen2_vl.modeling_qwen2_vl import Qwen2VLModel
    from my_folder.models.qwen2_vl import modeling_qwen2_vl
    from my_folder.models.qwen2_vl.modeling_qwen2_vl import Qwen2VLModel

After I changed to this the problem is solved.

ByronHsu commented 3 hours ago

Thanks for reporting. Can you verify liger rmsnorm is actually called by injecting some print in liger rmsnorm's forward? I think there might be some issues for .in_place after 0.4.1. Related to https://github.com/linkedin/Liger-Kernel/issues/383. I am looking into it.

jdf-prog commented 3 hours ago

Thanks for reporting. Can you verify liger rmsnorm is actually called by injecting some print in liger rmsnorm's forward? I think there might be some issues for .in_place after 0.4.1. Related to #383

Thanks, I have solved this issue, it seems to because of my custom model have not been applied to the liger kernel.

ByronHsu commented 3 hours ago

Want to make sure rmsnorm is actually patch to your new model. Maybe it fallbacks to hf rmsnorm so it showed no errors?

jdf-prog commented 3 hours ago

Yeah, probably, I am still trying to figure this out. It seems I still encounter this error after the above changes. Let me investigate a bit.