unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
15.77k stars 1.07k forks source link

SFT tuning with Unsloth ingoring input tokens #150

Open gotzmann opened 7 months ago

gotzmann commented 7 months ago

Hey, I'm using Unsloth with 48Gb cards where it able to PRE-train models up to 70B with 4K context.

Is it possible to use Unsloth to do SFT with instructions on which tokens should be ignored / masked, and attention matrix for properly packing samples?

Please help with some examples if it possible. Have to use Axolotl for SFT tasks now.

danielhanchen commented 7 months ago

@gotzmann I think you're referring to 2 things:

  1. Training only on completions ie masking out the inputs and just training only on completions. We use HF's TRL directly, so the data collator has to be editted. https://huggingface.co/docs/trl/sft_trainer#train-on-completions-only has more info on training only on completions.
  2. Packing then masking out completions if you're asking that - see the packing discussion #109 and #126
gotzmann commented 7 months ago

I'm using Unsloth not initializing any collators explicitly within my code. So basically trainer just process raw texts without any prompts and masks:

trainer = SFTTrainer(
    dataset_text_field = "text",
    packing = False,

What I'd like to understand:

1) How to use Unsloth with system prompting, adding some more information to implement train_on_inputs = false feature

2) How to pack aftermentioned examples with prompting - to maximize performance

I'll try to grok thru mentioned links, but those looks too complicated for me :)

danielhanchen commented 7 months ago

@gotzmann Would it be possible if you could show an approx few rows of your dataset, and what the required output would be?

gotzmann commented 7 months ago

I'm evaluating different open datasets, converting all of them to ShareGPT format for easy of use.

Like this one, where conversations column contains system / human / gpt parts of one input piece:

https://huggingface.co/datasets/teknium/OpenHermes-2.5

Basically what I'd want to implement directly with Unsloth and HF Transformers should be similar like this black magic code from Axolotl:

https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/src/axolotl/prompt_tokenizers.py

        tokenized_prompt = self._tokenize(user_prompt, add_eos_token=False)
        if not self.train_on_inputs:
            user_prompt_len = len(tokenized_prompt["input_ids"])
            # TODO this could be sped up using numpy array slicing
            tokenized_prompt["labels"] = [IGNORE_INDEX] * user_prompt_len
        tokenized_res_prompt = self._tokenize(
            response, strip_bos_token=True, add_eos_token=True
        )
        tokenized_prompt["input_ids"] += tokenized_res_prompt["input_ids"]
        tokenized_prompt["attention_mask"] += tokenized_res_prompt["attention_mask"]
        tokenized_prompt["labels"] += tokenized_res_prompt["input_ids"]
gotzmann commented 7 months ago

There special token for masking inputs:

IGNORE_TOKEN_ID = -100

danielhanchen commented 7 months ago

@gotzmann Cool thanks for the info! Redditors also mentioned on an example of ShareGPT style conversations - I'll see what I can do to make a Colab notebook :)

danielhanchen commented 7 months ago

@gotzmann https://www.reddit.com/r/LocalLLaMA/comments/1ail8jr/qlora_with_sharegpt_and_chatml_template_ready_to/ :) Looks like someone made a Unsloth example for ShareGPT style datasets just today!!

gotzmann commented 7 months ago

@danielhanchen sorry, nothing particularly interesting there.

Packing = False and there no any juggling with attention / inputs matrixes

Been there, done that :) There will be trashy model as output

danielhanchen commented 7 months ago

@gotzmann Yes but it partially solves ur first issue of using a ShareGPT style format - I think train only on inputs isnt done in that notebook. It will require some more custom codepaths.

For packing - again this is a feature request, so depending on how much bandwidth we have, I'll see what we can do.