pytorch / torchtune

PyTorch native finetuning library
https://pytorch.org/torchtune/main/
BSD 3-Clause "New" or "Revised" License
4.4k stars 449 forks source link

SmolLM2, Request for consideration #2060

Open insop opened 1 week ago

insop commented 1 week ago

Huggingface team has released Smollm2 recently, it seems very promising.

It is new so it makes sense to review and see before bringing the model to core as in #2058, but I wanted to create an issue here and see if there is any quick steps for those wanted to try the model with torchtune.

Thank you.

CC: @ebsmothers

ebsmothers commented 1 week ago

Hi @insop thanks for creating this issue. Similar to the case of #2058, I think the model itself is quite easy to support as a Llama-style arch. E.g. for the 1.7B model:

from torchtune.models.llama3._component_builders import llama3

smollm2_1_7b = llama3(
    vocab_size=49152,
    num_layers=24,
    num_heads=32,
    num_kv_heads=32,
    embed_dim=2048,
    max_seq_len=8192,
    rope_base=130000.0,
    intermediate_dim=8192,
)

In this case the tokenizer is similar to GPT2. I believe it should be possible to implement a version of this by closely following our Qwen2Tokenizer, which uses the same underlying BPE algorithm. You will just need to modify the special tokens

insop commented 1 week ago

Thank you @ebsmothers !