Open q5sys opened 4 months ago
Oh interesting!
Fine tuning for both ibm-granite/granite-3b-code-instruct
and ibm-granite/granite-8b-code-base
is working now as far as I checked with Llama3 Colab notebook, with training loss decreasing as expected. However, inference outputs are both useless still.
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
Continue the fibonnaci sequence.
### Input:
1, 1, 2, 3, 5, 8
### Response:
1#<fim_prefix>A
# str
growth
for
for
for
for
`
`
`
` ` ` ` ` ` ` ` ` ` 9\ `<fim_prefix><fim_prefix><fim_prefix><fim_prefix>
I noticed the other day when I was attempting to quantize the 34B larger models that the Granite models are 2 different types. The 3B,7B, and 8B models are llama, while the 20B and 34B are gpt-bigcode models. Not sure how that would or wouldn't affect fine tuning since i haven't looked into it yet, but I figured it was worth mentioning.
@q5sys So if its other model types, then well error out for now.
@junzzhu Oh wait its a Code model, so finetuning on text might not work as expected. Hence the weird output
Oh wait its a Code model, so finetuning on text might not work as expected. Hence the weird output
Cool! That helps. With 7b-base model, the output is meaningful now. Thanks @danielhanchen
Great it worked!!
@danielhanchen I am seeing such error now, which was fine about two weeks ago. Similarly, it happens with ibm-granite/granite-7b-instruct
model.
File [/opt/conda/lib/python3.10/site-packages/unsloth/models/loader.py:301](http://tf-pytorch-service-default.apps.pa-ai-cluster.cp.fyre.ibm.com/opt/conda/lib/python3.10/site-packages/unsloth/models/loader.py#line=300), in FastLanguageModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, resize_model_vocab, revision, *args, **kwargs)
298 tokenizer_name = None
299 pass
--> 301 model, tokenizer = dispatch_model.from_pretrained(
302 model_name = model_name,
303 max_seq_length = max_seq_length,
304 dtype = dtype,
305 load_in_4bit = load_in_4bit,
306 token = token,
307 device_map = device_map,
308 rope_scaling = rope_scaling,
309 fix_tokenizer = fix_tokenizer,
310 model_patcher = dispatch_model,
311 tokenizer_name = tokenizer_name,
312 trust_remote_code = trust_remote_code,
313 revision = revision if not is_peft else None,
314 *args, **kwargs,
315 )
317 if resize_model_vocab is not None:
318 model.resize_token_embeddings(resize_model_vocab)
File [/opt/conda/lib/python3.10/site-packages/unsloth/models/llama.py:1412](http://tf-pytorch-service-default.apps.pa-ai-cluster.cp.fyre.ibm.com/opt/conda/lib/python3.10/site-packages/unsloth/models/llama.py#line=1411), in FastLlamaModel.from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, token, device_map, rope_scaling, fix_tokenizer, model_patcher, tokenizer_name, trust_remote_code, **kwargs)
1410 # Counteract saved tokenizers
1411 tokenizer_name = model_name if tokenizer_name is None else tokenizer_name
-> 1412 tokenizer = load_correct_tokenizer(
1413 tokenizer_name = tokenizer_name,
1414 model_max_length = max_position_embeddings,
1415 padding_side = "right",
1416 token = token,
1417 trust_remote_code = trust_remote_code,
1418 fix_tokenizer = fix_tokenizer,
1419 )
1421 model, tokenizer = patch_tokenizer(model, tokenizer)
1422 model = model_patcher.post_patch(model)
File [/opt/conda/lib/python3.10/site-packages/unsloth/tokenizer_utils.py:563](http://tf-pytorch-service-default.apps.pa-ai-cluster.cp.fyre.ibm.com/opt/conda/lib/python3.10/site-packages/unsloth/tokenizer_utils.py#line=562), in load_correct_tokenizer(tokenizer_name, model_max_length, padding_side, token, trust_remote_code, cache_dir, fix_tokenizer)
560 chat_template = old_chat_template
562 else:
--> 563 chat_template = fix_chat_template(tokenizer)
564 if old_chat_template is not None and chat_template is None:
565 raise RuntimeError(
566 "Unsloth: Fixing chat template failed - please file a report immediately!"
567 )
File [/opt/conda/lib/python3.10/site-packages/unsloth/tokenizer_utils.py:638](http://tf-pytorch-service-default.apps.pa-ai-cluster.cp.fyre.ibm.com/opt/conda/lib/python3.10/site-packages/unsloth/tokenizer_utils.py#line=637), in fix_chat_template(tokenizer)
636 new_chat_template = _fix_chat_template(chat_template)
637 if "{% if add_generation_prompt %}" not in new_chat_template:
--> 638 raise RuntimeError(
639 f"Unsloth: The tokenizer `{tokenizer.name_or_path}`\n"\
640 "does not have a {% if add_generation_prompt %} for generation purposes.\n"\
641 "Please file a bug report immediately - thanks!"
642 )
643 else:
644 logger.warning_once(
645 "Unsloth: We successfully patched the tokenizer to add a {% if add_generation_prompt %} to the chat_template.\n"\
646 "This is not a bug, but please notify the Unsloth maintainers - thanks!"
647 )
RuntimeError: Unsloth: The tokenizer `instructlab[/granite-7b-lab](http://tf-pytorch-service-default.apps.pa-ai-cluster.cp.fyre.ibm.com/granite-7b-lab)`
does not have a {% if add_generation_prompt %} for generation purposes.
Please file a bug report immediately - thanks!
Oh my it means their chat template is wrong
These open source models were just released yesterday at Red Hat Summit. https://huggingface.co/ibm-granite https://arxiv.org/abs/2405.04324
If this ends up being a bigger ask than I think it is, and there's something I can do to help in making this happen, let me know.