Open insop opened 1 week ago
Hi @insop thanks for creating this issue. Similar to the case of #2058, I think the model itself is quite easy to support as a Llama-style arch. E.g. for the 1.7B model:
from torchtune.models.llama3._component_builders import llama3
smollm2_1_7b = llama3(
vocab_size=49152,
num_layers=24,
num_heads=32,
num_kv_heads=32,
embed_dim=2048,
max_seq_len=8192,
rope_base=130000.0,
intermediate_dim=8192,
)
In this case the tokenizer is similar to GPT2. I believe it should be possible to implement a version of this by closely following our Qwen2Tokenizer, which uses the same underlying BPE algorithm. You will just need to modify the special tokens
Thank you @ebsmothers !
Huggingface team has released Smollm2 recently, it seems very promising.
It is new so it makes sense to review and see before bringing the model to core as in #2058, but I wanted to create an issue here and see if there is any quick steps for those wanted to try the model with torchtune.
Thank you.
CC: @ebsmothers