unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
15.05k stars 1.01k forks source link

Support for model trained by OLMo? #774

Open CloudyDory opened 1 month ago

CloudyDory commented 1 month ago

Hi, it seems that unsloth currently does not support loading base model trained by OLMo. Is it possible to write custom script to load the model into unsloth? The model architecture is shown below, and it is also using the "pre-layernorm" transformer architecture.

{
  "architectures": [
    "OlmoForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "clip_qkv": null,
  "eos_token_id": 0,
  "hidden_act": "silu",
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 8192,
  "max_position_embeddings": 2048,
  "model_type": "olmo",
  "num_attention_heads": 16,
  "num_hidden_layers": 16,
  "num_key_value_heads": 16,
  "pad_token_id": 1,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": true,
  "torch_dtype": "float32",
  "transformers_version": "4.42.3",
  "use_cache": true,
  "vocab_size": 51200
}
danielhanchen commented 1 month ago

I'm unsure what Olmo's architecture is - in theory Unsloth can work for it, but it's best to wait for all mode support in Unsloth