unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
15.6k stars 1.05k forks source link

Requesting support for IBM's OpenSource Granite models #441

Open q5sys opened 4 months ago

q5sys commented 4 months ago

These open source models were just released yesterday at Red Hat Summit. https://huggingface.co/ibm-granite https://arxiv.org/abs/2405.04324

If this ends up being a bigger ask than I think it is, and there's something I can do to help in making this happen, let me know.

danielhanchen commented 4 months ago

Oh interesting!

junzzhu commented 3 months ago

Fine tuning for both ibm-granite/granite-3b-code-instruct and ibm-granite/granite-8b-code-base is working now as far as I checked with Llama3 Colab notebook, with training loss decreasing as expected. However, inference outputs are both useless still.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Continue the fibonnaci sequence.

### Input:
1, 1, 2, 3, 5, 8

### Response:
1#<fim_prefix>A
# str
 growth
 for
 for
 for
 for

  `

  `
 `

 ` ` ` ` ` ` ` ` ` `                                                           9\ `<fim_prefix><fim_prefix><fim_prefix><fim_prefix>
q5sys commented 3 months ago

I noticed the other day when I was attempting to quantize the 34B larger models that the Granite models are 2 different types. The 3B,7B, and 8B models are llama, while the 20B and 34B are gpt-bigcode models. Not sure how that would or wouldn't affect fine tuning since i haven't looked into it yet, but I figured it was worth mentioning.

danielhanchen commented 3 months ago

@q5sys So if its other model types, then well error out for now.

@junzzhu Oh wait its a Code model, so finetuning on text might not work as expected. Hence the weird output

junzzhu commented 3 months ago

Oh wait its a Code model, so finetuning on text might not work as expected. Hence the weird output

Cool! That helps. With 7b-base model, the output is meaningful now. Thanks @danielhanchen

danielhanchen commented 3 months ago

Great it worked!!

junzzhu commented 3 weeks ago

@danielhanchen I am seeing such error now, which was fine about two weeks ago. Similarly, it happens with ibm-granite/granite-7b-instruct model.

File [/opt/conda/lib/python3.10/site-packages/unsloth/models/loader.py:301](http://tf-pytorch-service-default.apps.pa-ai-cluster.cp.fyre.ibm.com/opt/conda/lib/python3.10/site-packages/unsloth/models/loader.py#line=300), in FastLanguageModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, resize_model_vocab, revision, *args, **kwargs)
    298     tokenizer_name = None
    299 pass
--> 301 model, tokenizer = dispatch_model.from_pretrained(
    302     model_name        = model_name,
    303     max_seq_length    = max_seq_length,
    304     dtype             = dtype,
    305     load_in_4bit      = load_in_4bit,
    306     token             = token,
    307     device_map        = device_map,
    308     rope_scaling      = rope_scaling,
    309     fix_tokenizer     = fix_tokenizer,
    310     model_patcher     = dispatch_model,
    311     tokenizer_name    = tokenizer_name,
    312     trust_remote_code = trust_remote_code,
    313     revision          = revision if not is_peft else None,
    314     *args, **kwargs,
    315 )
    317 if resize_model_vocab is not None:
    318     model.resize_token_embeddings(resize_model_vocab)

File [/opt/conda/lib/python3.10/site-packages/unsloth/models/llama.py:1412](http://tf-pytorch-service-default.apps.pa-ai-cluster.cp.fyre.ibm.com/opt/conda/lib/python3.10/site-packages/unsloth/models/llama.py#line=1411), in FastLlamaModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, model_patcher, tokenizer_name, trust_remote_code, **kwargs)
   1410 # Counteract saved tokenizers
   1411 tokenizer_name = model_name if tokenizer_name is None else tokenizer_name
-> 1412 tokenizer = load_correct_tokenizer(
   1413     tokenizer_name    = tokenizer_name,
   1414     model_max_length  = max_position_embeddings,
   1415     padding_side      = "right",
   1416     token             = token,
   1417     trust_remote_code = trust_remote_code,
   1418     fix_tokenizer     = fix_tokenizer,
   1419 )
   1421 model, tokenizer = patch_tokenizer(model, tokenizer)
   1422 model = model_patcher.post_patch(model)

File [/opt/conda/lib/python3.10/site-packages/unsloth/tokenizer_utils.py:563](http://tf-pytorch-service-default.apps.pa-ai-cluster.cp.fyre.ibm.com/opt/conda/lib/python3.10/site-packages/unsloth/tokenizer_utils.py#line=562), in load_correct_tokenizer(tokenizer_name, model_max_length, padding_side, token, trust_remote_code, cache_dir, fix_tokenizer)
    560     chat_template = old_chat_template
    562 else:
--> 563     chat_template = fix_chat_template(tokenizer)
    564     if old_chat_template is not None and chat_template is None:
    565         raise RuntimeError(
    566             "Unsloth: Fixing chat template failed - please file a report immediately!"
    567         )

File [/opt/conda/lib/python3.10/site-packages/unsloth/tokenizer_utils.py:638](http://tf-pytorch-service-default.apps.pa-ai-cluster.cp.fyre.ibm.com/opt/conda/lib/python3.10/site-packages/unsloth/tokenizer_utils.py#line=637), in fix_chat_template(tokenizer)
    636 new_chat_template = _fix_chat_template(chat_template)
    637 if "{% if add_generation_prompt %}" not in new_chat_template:
--> 638     raise RuntimeError(
    639         f"Unsloth: The tokenizer `{tokenizer.name_or_path}`\n"\
    640         "does not have a {% if add_generation_prompt %} for generation purposes.\n"\
    641         "Please file a bug report immediately - thanks!"
    642     )
    643 else:
    644     logger.warning_once(
    645         "Unsloth: We successfully patched the tokenizer to add a {% if add_generation_prompt %} to the chat_template.\n"\
    646         "This is not a bug, but please notify the Unsloth maintainers - thanks!"
    647     )

RuntimeError: Unsloth: The tokenizer `instructlab[/granite-7b-lab](http://tf-pytorch-service-default.apps.pa-ai-cluster.cp.fyre.ibm.com/granite-7b-lab)`
does not have a {% if add_generation_prompt %} for generation purposes.
Please file a bug report immediately - thanks!
danielhanchen commented 3 weeks ago

Oh my it means their chat template is wrong