linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training
https://arxiv.org/pdf/2410.10989
BSD 2-Clause "Simplified" License
3.4k stars 195 forks source link

Model Agnostic Patching? #70

Closed lapp0 closed 2 months ago

lapp0 commented 2 months ago

πŸš€ The feature, motivation and pitch

Liger is missing support for a number of language model architectures. Additionally, I'd like to be able to patch any model without consideration for the model architecture using a simple apply_liger_kernel_to_all_modules(pretrained_model)

Alternatives

No response

Additional context

The transformers library isn't very good at sharing code between identical implementations. For example LlamaRMSNorm is exactly the same as MistralRMSNorm

Provided the lack of code sharing in transformers, perhaps a hack like this would work for checking whether to apply a patch to a module?

>>> from transformers.models.llama import modeling_llama
>>> from transformers.models.mistral import modeling_mistral
>>> from transformers.models.phi3 import modeling_phi3
>>> 
>>> def is_equivalent_module(module0, module1):
...     return (
...         module0.__init__.__code__.co_code == module1.__init__.__code__.co_code
...         and module0.forward.__code__.co_code == module1.forward.__code__.co_code
...     )

>>> is_equivalent_module(modeling_llama.LlamaRMSNorm, modeling_mistral.MistralRMSNorm)
True
>>> is_equivalent_module(modeling_llama.LlamaRMSNorm, modeling_phi3.Phi3RMSNorm)
True
ByronHsu commented 2 months ago

@robertgshaw2-neuralmagic has proposed a cool idea. We can have a AutoLigerModelForCausalLM class which auto detects the model type and apply the kernels, like in AutoAWQ

model = transformers.AutoLigerModelForCausalLM.from_pretrained("<some llama model>")

cc @yundai424 @qingquansong @robertgshaw2-neuralmagic thoughts?

ByronHsu commented 2 months ago

https://github.com/casper-hansen/AutoAWQ/blob/6f14fc7436d9a3fb5fc69299e4eb37db4ee9c891/awq/models/auto.py#L50

yundai424 commented 2 months ago

Sounds super good to me! ~We can just load and parse the AutoConfig to tell what's the model family without importing modeling_xxx module and apply the patch accordingly~ exactly how autoAWQ is doing it πŸ˜„

qingquansong commented 2 months ago

Yeah, that’d be awesome! This might be even better than using a uniform function apply_liger_kernel to wrap the model which needs extra one or two line efforts :)

On Sat, Aug 24, 2024 at 09:35 Byron Hsu @.***> wrote:

@ robertgshaw2-neuralmagic has proposed a cool idea. We can have a AutoLigerModelForCausalLM class which auto detects the model type and apply the kernels, like in AutoAWQ model = transformers. AutoLigerModelForCausalLM. from_pretrained("<some ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

@robertgshaw2-neuralmagic https://urldefense.com/v3/__https://github.com/robertgshaw2-neuralmagic__;!!KwNVnqRv!CX6tr9sJYmBSwc4bbH1wCr23MfgSrheUiIZyDYc4A7arJLPdeXPRzdZ_aUL1KctM_B59ovtaQVX2xUkknUFljRXkrOYDUg$ has proposed a cool idea. We can have a AutoLigerModelForCausalLM class which auto detects the model type and apply the kernels, like in AutoAWQ https://urldefense.com/v3/__https://github.com/casper-hansen/AutoAWQ__;!!KwNVnqRv!CX6tr9sJYmBSwc4bbH1wCr23MfgSrheUiIZyDYc4A7arJLPdeXPRzdZ_aUL1KctM_B59ovtaQVX2xUkknUFljRV4ffRh2g$

model = transformers.AutoLigerModelForCausalLM.from_pretrained("")

cc @yundai424 https://urldefense.com/v3/__https://github.com/yundai424__;!!KwNVnqRv!CX6tr9sJYmBSwc4bbH1wCr23MfgSrheUiIZyDYc4A7arJLPdeXPRzdZ_aUL1KctM_B59ovtaQVX2xUkknUFljRV4ou0_VQ$ @qingquansong https://urldefense.com/v3/__https://github.com/qingquansong__;!!KwNVnqRv!CX6tr9sJYmBSwc4bbH1wCr23MfgSrheUiIZyDYc4A7arJLPdeXPRzdZ_aUL1KctM_B59ovtaQVX2xUkknUFljRVlCAFFKg$ @robertgshaw2-neuralmagic https://urldefense.com/v3/__https://github.com/robertgshaw2-neuralmagic__;!!KwNVnqRv!CX6tr9sJYmBSwc4bbH1wCr23MfgSrheUiIZyDYc4A7arJLPdeXPRzdZ_aUL1KctM_B59ovtaQVX2xUkknUFljRXkrOYDUg$ thoughts?

β€” Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/linkedin/Liger-Kernel/issues/70*issuecomment-2308449687__;Iw!!KwNVnqRv!CX6tr9sJYmBSwc4bbH1wCr23MfgSrheUiIZyDYc4A7arJLPdeXPRzdZ_aUL1KctM_B59ovtaQVX2xUkknUFljRWqs2aldA$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AGC2EPFNQPO4RK3MCOZPYN3ZTCY6BAVCNFSM6AAAAABNBXBUXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBYGQ2DSNRYG4__;!!KwNVnqRv!CX6tr9sJYmBSwc4bbH1wCr23MfgSrheUiIZyDYc4A7arJLPdeXPRzdZ_aUL1KctM_B59ovtaQVX2xUkknUFljRU3dQEGOQ$ . You are receiving this because you were mentioned.Message ID: @.***>

lancerts commented 2 months ago

This idea is super cool!

dakinggg commented 2 months ago

@ByronHsu You can certainly go the automodel route, it might be nice to also simply expose a apply_to_all_supported_models (or wtvr you want to call it). The reason being, people may have complex code on top of hf (maybe using their own auto class, or instantiating classes directly), and it would be nice to have a generic patch route where you only have to call one function to "turn on" liger.

It would also be nice to include some indication of which models have been equivalence tested (e.g. I see a comment in the code that gemma has not yet been fully tested). So it could be like apply_to_all_supported_models(include_experimental: bool = False) or something

ByronHsu commented 2 months ago

@dakinggg nice to see you here. this is a superb idea!

ArthurZucker commented 2 months ago

Hey all! πŸ‘‹πŸ» Feedback heard from the transformers side, here is some input:

The transformers library isn't very good at sharing code between identical implementations. For example LlamaRMSNorm is exactly the same as MistralRMSNorm two things:

  • In such case we use Copied from, but that would require parsing the files, which is indeed not convenient. In that effect we are introducing inheritance, little by little, which should help. (https://github.com/huggingface/transformers/pull/30868 for examples). With this new format, finding dependencies between models should be simpler

Otherwise the autoAWQ example is good! The solution proposed by @dakinggg as well.

Tho we are going to make an effort at making your life easier, using the diff + inheritance should help you determine code sharing and automatically supported models a bit easier no?

shimizust commented 2 months ago

Thanks all for the ideas. In the HF Trainer class we added a private api similar to apply_to_all_supported_models:

liger_kernel.transformers.trainer_integration._apply_liger_kernel(model_type=model.config.model_type)

where at least _apply_liger_kernel is handling the logic for which layers to patch. This still requires maintenance on Liger-Kernel side, but allows folks to just upgrade liger-kernel to get the latest mappings applied. If we introduced an AutoLiger* class, it would essentially be calling this _apply_liger_kernel() method.

One solution to make it easier to adopt is to make this generic API more public @ByronHsu . I would prefer providing a simple function to call vs. managing potentially several AutoModel* classes that might conflict with how users are instantiating their models as @dakinggg was mentioning.

I think the main issue is then if we can dynamically add newer model support as they come, where often times they are directly copying code from a few base models. @ArthurZucker would the diff_converter code you shared be able to used directly? It seems it is going from a diff --> single model file. Would you still have to parse the modeling file to see if it was generated from another model?

shimizust commented 2 months ago

@lapp0 's suggestion to compare bytecode of different modules could work, but it does require passing or importing the current model's modeling file. And then since the relevant module names of the new model would be unknown, I think we would have to iterate through all the modules of the current model (N) and compare it with all the modules that can be patched (M) for M*N comparisons. In addition, we would still need some model-specific logic (e.g. can't apply crossentropyloss kernel and fusedlinearcrossentropyloss kernel at the same time).

ByronHsu commented 2 months ago

Personally i like auto model class more. We should make it the first class. apply_to_all_supported_models can be an alternative for advanced users.

It would also be nice to include some indication of which models have been equivalence tested (e.g. I see a comment in the code that gemma has not yet been fully tested). So it could be like apply_to_all_supported_models(include_experimental: bool = False) or something

We should not patch experimental model we are working on. Only exact one should be patched. It was a mistake sorry.

dwyatte commented 2 months ago

Is there advice for adding Liger support support to a model at the moment? Should users follow the llama/mistral/qwen examples by establishing monkey patching and cross entropy patching?

ArthurZucker commented 2 months ago

would the diff_converter code you shared be able to used directly? It seems it is going from a diff --> single model file. Would you still have to parse the modeling file to see if it was generated from another model?

yep, the goal is for the code from the diff_xxx to be also usable as a replacement for the code in the modeling_xxx. TLDR, it should give you dependencies if you import for example GemmaForCausalLM from the diff file, instead of importing from the modeling file!

shimizust commented 2 months ago

@ArthurZucker Thanks, I see how it works now. If I'm understanding correctly, it seems the flow would be something like:

For now though, we've gone with the AutoLigerKernelForCausalLM approach to wrap models and maintain fully supported/tested models within liger-kernel library. We may revisit in the future though.