linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training
https://arxiv.org/pdf/2410.10989
BSD 2-Clause "Simplified" License
3.37k stars 190 forks source link

Support Yi-Coder #208

Closed ryankert01 closed 1 month ago

ryankert01 commented 2 months ago

🚀 The feature, motivation and pitch

to-dos:

  1. implement an API for yi-coder
  2. test yi-coder out with llama lce_forward

Alternatives

No response

Additional context

from discord discussion

ryankert01 commented 2 months ago

take @ByronHsu

ByronHsu commented 2 months ago

any progress

ryankert01 commented 2 months ago

I'll open a pr by the weekends

ryankert01 commented 1 month ago

Hi @ByronHsu , just noticed huggingface llama is mapped with based model, and yicoder has its base model configured. I think maybe we don't have to do a code change. I'll test it out shortly if it works. (not sure if I'm wrong)

ref:

  1. https://github.com/huggingface/transformers/blob/main/src/transformers/models/auto/modeling_auto.py#L34
  2. https://huggingface.co/01-ai/Yi-Coder-9B-Chat/blob/main/config.json#L3
ryankert01 commented 1 month ago

UPDATE: got it, looks like it'll soon be solve by https://github.com/linkedin/Liger-Kernel/pull/199

Hi @ByronHsu , I just did the research, but I found an odd thing: when I only configure the SFTconfig with use_liger=True, the GPU usage is same as not use liger, but if I use

model = AutoLigerKernelForCausalLM.from_pretrained(model_name)

it's significant better. it's not align with our sfttrainer docs on huggingface.

could you help me look into it? research notebook

shimizust commented 1 month ago

@ryankert01 Thanks for the comment. #199 is ready and should get incorporated soon. Right now, the SFTConfig doesn't actually do anything with the use_liger flag unless you pass in a model path (and then it will load model using AutoLigerKernelForCausalLM) vs. an already instantiated model. After this change, will need to have SFTTrainer updated to call this new API.

ryankert01 commented 1 month ago

close by https://github.com/huggingface/transformers/pull/33502