SmolLM2, Request for consideration

insop commented 1 week ago

Huggingface team has released Smollm2 recently, it seems very promising.

It is new so it makes sense to review and see before bringing the model to core as in #2058, but I wanted to create an issue here and see if there is any quick steps for those wanted to try the model with torchtune.

Thank you.

CC: @ebsmothers

ebsmothers commented 1 week ago

Hi @insop thanks for creating this issue. Similar to the case of #2058, I think the model itself is quite easy to support as a Llama-style arch. E.g. for the 1.7B model:

from torchtune.models.llama3._component_builders import llama3

smollm2_1_7b = llama3(
    vocab_size=49152,
    num_layers=24,
    num_heads=32,
    num_kv_heads=32,
    embed_dim=2048,
    max_seq_len=8192,
    rope_base=130000.0,
    intermediate_dim=8192,
)

In this case the tokenizer is similar to GPT2. I believe it should be possible to implement a version of this by closely following our Qwen2Tokenizer, which uses the same underlying BPE algorithm. You will just need to modify the special tokens

insop commented 1 week ago

Thank you @ebsmothers !

pytorch / torchtune

SmolLM2, Request for consideration #2060