DESCRIPTION:
Here we provide a very simple script for supervised finetuning, which is revised from the training script in Fastchat. The script is used to finetune Qwen with Hugging Face Trainer. You can check the script here. This script for supervised finetuning (SFT) has the following features:
Support single-GPU and multi-GPU training;
Support full-parameter tuning, LoRA, and Q-LoRA.
In the following, we introduce more details about the usage of the script.
Installation
Before you start, make sure you have installed the following packages:
pip install peft deepspeed optimum accelerate
Data Preparation
For data preparation, we advise you to organize the data in a jsonl file, where each line is a dictionary as demonstrated below:
{
"type": "chatml",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Tell me something about large language models."
},
{
"role": "assistant",
"content": "Large language models are a type of language model that is trained on a large corpus of text data. They are capable of generating human-like text and are used in a variety of natural language processing tasks..."
}
],
"source": "unknown"
}
{
"type": "chatml",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is your name?"
},
{
"role": "assistant",
"content": "My name is Qwen."
}
],
"source": "self-made"
}
Above are two examples of each data sample in the dataset. Each sample is a JSON object with the following fields: type, messages, and source. messages is required while the others are optional for you to label your data format and data source. The messages field is a list of JSON objects, each of which has two fields: role and content. role can be system, user, or assistant. content is the text of the message. source is the source of the data, which can be self-made, alpaca, open-hermes, or any other string.
To make the jsonl file, you can use json to save a list of dictionaries to the jsonl file:
import json
with open('data.jsonl', 'w') as f:
for sample in samples:
f.write(json.dumps(sample) + '\n')
Quickstart
For you to start finetuning quickly, we directly provide a shell script for you to run without paying attention to details. You need different hyperparameters for different types of training, e.g., single-GPU / multi-GPU training, full-parameter tuning, LoRA, or Q-LoRA.
Specify the <model_path> for your model, <data_path> for your data, and <config_path> for your deepspeed configuration. If you use LoRA or Q-LoRA, just add --use_lora True or --q_lora True based on your requirements. This is the simplest way to start finetuning. If you want to change more hyperparameters, you can dive into the script and modify those parameters.
Advanced Usages
In this section, we introduce the details of the scripts, including the core python script as well as the corresponding shell script.
Shell Script
Before we introduce the python code, we provide a brief introduction to the shell script with commands. We provide some guidance inside the shell script and here we take finetune.sh as an example.
To set up the environment variables for distributed training (or single-GPU training), specify the following variables: GPUS_PER_NODE, NNODES, NODE_RANK, MASTER_ADDR, and MASTER_PORT. No need to worry too much about them as we provide the default settings for you. In the command, you can pass in the argument -m and -d to specify the model path and data path, respectively. You can also pass in the argument --deepspeed to specify the deepspeed configuration file. We provide two configuration files for ZeRO2 and ZeRO3, and you can choose one based on your requirements. In most cases, we recommend using ZeRO3 for multi-GPU training except for Q-LoRA, where we recommend using ZeRO2.
There are a series of hyperparameters to tune. Passing in --bf16 or --fp16 to specify the precision for mixed precision training. The other significant hyperparameters include:
--output_dir: the path of your output models or adapters.
--num_train_epochs: the number of training epochs.
--gradient_accumulation_steps: the number of gradient accumulation steps.
--per_device_train_batch_size: the batch size per GPU for training, and the total batch size is equal to per_device_train_batch_size * number_of_gpus * gradient_accumulation_steps.
--learning_rate: the learning rate.
--warmup_steps: the number of warmup steps.
--lr_scheduler_type: the type of learning rate scheduler.
--weight_decay: the value of weight decay.
--adam_beta2: the value of in Adam.
--model_max_length: the maximum sequence length.
--use_lora: whether to use LoRA. Adding --q_lora can enable Q-LoRA.
--gradient_checkpointing: whether to use gradient checkpointing.
### DetailsSimilarity score: 0.9
> **Note:**
>
> [bigcode/tiny_starcoder_py · Hugging Face](https://huggingface.co/bigcode/tiny_starcoder_py)
>
> TinyStarCoderPy
>
> This is a 164M parameters model with the same architecture as StarCoder (8k context length, MQA & FIM). It was trained on the Python data from StarCoderData for ~6 epochs which amounts to 100B tokens.
>
> Use
>
> Intended use
>
> The model was trained on GitHub code, to assist with some tasks like Assisted Generation. For pure code completion, we advise using our 15B models StarCoder or StarCoderBase.
>
> Generation
>
> ```python
> # pip install -q transformers
> from transformers import AutoModelForCausalLM, AutoTokenizer
>
> checkpoint = "bigcode/tiny_starcoder_py"
> device = "cuda" # for GPU usage or "cpu" for CPU usage
>
> tokenizer = AutoTokenizer.from_pretrained(checkpoint)
> model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
>
> inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
> outputs = model.generate(inputs)
> print(tokenizer.decode(outputs[0]))
> ```
>
> Fill-in-the-middle
>
> Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
>
> ```python
> input_text = "def print_one_two_three():\n print('one')\n \n print('three')"
> inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
> outputs = model.generate(inputs)
> print(tokenizer.decode(outputs[0]))
> ```
>
> Training
>
> Model
>
> - Architecture: GPT-2 model with multi-query attention and Fill-in-the-Middle objective
> - Pretraining steps: 50k
> - Pretraining tokens: 100 billion
> - Precision: bfloat16
>
> Hardware
>
> - GPUs: 32 Tesla A100
> - Training time: 18 hours
>
> Software
>
> - Orchestration: Megatron-LM
> - Neural networks: PyTorch
> - BP16 if applicable: apex
>
> License
>
> The model is licensed under the BigCode OpenRAIL-M v1 license agreement. You can find the full agreement [here](https://huggingface.co/bigcode/tiny_starcoder_py/blob/main/LICENSE).
>
> #### Suggested labels
>
> - { "key": "llm-pretraining", "value": "Information related to the pretraining process of Large Language Models" }
167: Model training code from NousResearch/StripedHyenaTrainer
### DetailsSimilarity score: 0.89
- [ ] [NousResearch/StripedHyenaTrainer](https://github.com/NousResearch/StripedHyenaTrainer)
This is the training code used to train StripedHyena-Nous-7B.
First, tokenize your data
python tokenization.py \
--dataset your-super-cool-sharegpt-format-dataset \
--tokenizer togethercomputer/StripedHyena-Hessian-7B \
--output tokenized \
--num-proc 32 \
--pad-to-length 4096 \
--truncate
Make sure you have done accelerate config -- we used the provided DeepSpeed configuration. Then, train!
accelerate launch finetune.py \
--model togethercomputer/StripedHyena-Hessian-7B \
--dataset tokenized \
--output-dir output \
--epochs 4 \
--batch-size 12 \
--gradient-accumulate-every 12 \
--warmup-steps 350 \
--learning-rate 0.000004 \
--lr-schedule linear \
--weight-decay 0.1 \
--checkpointing-steps 1000 \
--no-decay poles residues
The --no-decay option disables weight decay on only the specified parameters. For StripedHyena, we've found that disabling weight decay on the Hyena operator's poles and residues parameters improves performance. There is also an option --frozen that can completely freeze select parameter groups.
389: AWQ Quantization support - New generic converter for all HF llama-like models - Tutorials - OpenNMT
### DetailsSimilarity score: 0.89
- [ ] [AWQ Quantization support - New generic converter for all HF llama-like models - Tutorials - OpenNMT](https://forum.opennmt.net/t/awq-quantization-support-new-generic-converter-for-all-hf-llama-like-models/5569)
**Quantization and Acceleration**
We have added support for already quantized models, and revamped the converter for all llama-like models, whether they are quantized or not. Here's an example of the syntax:
```bash
python tools/convert_HF_llamalike.py --model_dir "TheBloke/Nous-Hermes-Llama2-AWQ" --output "/dataAI/llama2-7B/Hermes/Nous-Hermes-onmt.pt" --format safetensors
```
* `TheBloke/Nous-Hermes-Llama2-AWQ`: The name of the repository/model on the Hugging Face Hub.
* `output`: Specifies the target directory and model name you want to save.
* `format`: Optionally, you can save as safetensors.
For llama-like models, we download the `tokenizer.model` and generate a vocab file during the process. If the model is a AWQ quantized model, we will convert it to an OpenNMT-py AWQ quantized model.
After converting, you will need a config file to run `translate.py` or `run_mmlu_opnenmt.py`. Here's an example of the config:
```yaml
transforms: [sentencepiece]
#### Subword
src_subword_model: "/dataAI/llama2-7B/Hermes/tokenizer.model"
tgt_subword_model: "/dataAI/llama2-7B/Hermes/tokenizer.model"
# Model info
model: "/dataAI/llama2-7B/Hermes/Nous-Hermes-onmt.pt"
# Inference
# ...
```
When considering your priority:
- For small model files to fit VRAM of your GPU, try AWQ, but it will be slow for large batch sizes.
- AWQ models are faster than FP16 for batch size 1.
Please read more here: [GitHub - casper-hansen/AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
**Important Note:**
- There are two AWQ toolkits (llm-awq and AutoAWQ) and AutoAWQ supports two flavors: GEMM / GEMV.
- The original llm-awq from MIT is not maintained periodically, so we default to AutoAWQ.
- If a model is tagged llm-awq on the HF hub, we use AutoAWQ/GEMV, which is compatible.
**Offline Quantizer Script:**
- We will provide an offline quantizer script for OpenNMT-py generic models. However, for small NMT models, AWQ may make things slower, so it might not be relevant for NMT.
Enjoy!
---
**VS**: Fast Inference with vLLM
Recently, Mistral reported 100 tokens/second for Mistral-7B at batch size 1 and 1250 tokens/sec for a batch of 60 prompts using vLLM. When using Mistral-instruct-v0.2-onmt-awq, the performance was as follows:
- Batch size 1: 80.5 tokens/second
- Batch size 60: 98 tokens/second, with GEMV being 20-25% faster.
This was with a GEMM model. To make a fair comparison, adjust the throughput for the step0 (prompt prefill) time.
#### Suggested labels
#### { "key": "llm-quantization", "value": "Discussions and tools for handling quantized large language models" }
431: awq llama quantization
### DetailsSimilarity score: 0.89
- [ ] [awq llama quantization](huggingface.co)
Quantization and Acceleration
----------------------------
We have added support for already quantized models, and revamped the converter for all llama-like models, whether they are quantized or not.
### Model Conversion
Here's an example of the syntax for converting a model:
```python
tools/convert_HF_llamalike.py --model_dir "TheBloke/Nous-Hermes-Llama2-AWQ" --output "/dataAI/llama2-7B/Hermes/Nous-Hermes-onmt.pt" --format safetensors
```
- `TheBloke/Nous-Hermes-Llama2-AWQ`: The name of the repository/model on the Hugging Face Hub.
- `output`: Specifies the target directory and model name you want to save.
- `format`: Optionally, you can save as safetensors.
For llama-like models, we download the tokenizer.model and generate a vocab file during the process. If the model is a AWQ quantized model, we will convert it to an OpenNMT-py AWQ quantized model.
### Config File
After converting, you will need a config file to run `translate.py` or `run_mmlu_opnenmt.py`. Here's an example of the config:
```yaml
transforms: [sentencepiece]
Subword:
src_subword_model: "/dataAI/llama2-7B/Hermes/tokenizer.model"
tgt_subword_model: "/dataAI/llama2-7B/Hermes/tokenizer.model"
Model info:
model: "/dataAI/llama2-7B/Hermes/Nous-Hermes-onmt.pt"
Inference:
# ...
```
### Priority
When considering your priority:
- For small model files to fit VRAM of your GPU, try AWQ, but it will be slow for large batch sizes.
- AWQ models are faster than FP16 for batch size 1.
- Read more: [GitHub - casper-hansen/AutoAWQ](https://github.com/casper-hansen/AutoAWQ)
### Important Note
- There are two AWQ toolkits (llm-awq and AutoAWQ) and AutoAWQ supports two flavors: GEMM / GEMV.
- The original llm-awq from MIT is not maintained periodically, so we default to AutoAWQ.
- If a model is tagged llm-awq on the HF hub, we use AutoAWQ/GEMV, which is compatible.
### Offline Quantizer Script
We will provide an offline quantizer script for OpenNMT-py generic models. However, for small NMT models, AWQ may make things slower, so it might not be relevant for NMT.
### vLLM Performance
Recently, Mistral reported 100 tokens/second for Mistral-7B at batch size 1 and 1250 tokens/sec for a batch of 60 prompts using vLLM. When using Mistral-instruct-v0.2-onmt-awq, the performance was as follows:
- Batch size 1: 80.5 tokens/second
- Batch size 60: 98 tokens/second, with GEMV being 20-25% faster.
- This was with a GEMM model. To make a fair comparison, adjust the throughput for the step0 (prompt prefill) time.
#### Suggested labels
#### null
383: deepseek-ai/deepseek-coder-5.7bmqa-base · Hugging Face
### DetailsSimilarity score: 0.89
- [ ] [deepseek-ai/deepseek-coder-5.7bmqa-base · Hugging Face](https://huggingface.co/deepseek-ai/deepseek-coder-5.7bmqa-base)
Deepseek Coder Introduction
----------------------------
Deepseek Coder is a series of code language models, each trained from scratch on 2T tokens with a composition of 87% code and 13% natural language in both English and Chinese. We provide various sizes of the code model, ranging from 1B to 33B versions. Each model is pre-trained on a project-level code corpus with a window size of 16K and an extra fill-in-the-blank task, supporting project-level code completion and infilling. Deepseek Coder achieves state-of-the-art performance among open-source code models on multiple programming languages and various benchmarks.
### Key Features
- **Massive Training Data:** Trained from scratch on 2T tokens, including 87% code and 13% linguistic data in both English and Chinese languages.
- **Highly Flexible & Scalable:** Offered in model sizes of 1.3B, 5.7B, 6.7B, and 33B, enabling users to choose the setup most suitable for their requirements.
- **Superior Model Performance:** State-of-the-art performance among publicly available code models on HumanEval, MultiPL-E, MBPP, DS-1000, and APPS benchmarks.
- **Advanced Code Completion Capabilities:** A window size of 16K and a fill-in-the-blank task, supporting project-level code completion and infilling tasks.
### Model Summary
- **deepseek-coder-5.7bmqa-base:** A 5.7B parameter model with Multi Query Attention, trained on 2 trillion tokens.
- **Home Page:** [DeepSeek](http://deepseek.com)
- **Repository:** [deepseek-ai/deepseek-coder](https://github.com/deepseek-ai/deepseek-coder)
- **Chat With DeepSeek Coder:** [DeepSeek-Coder](https://github.com/deepseek-ai/deepseek-coder/discussions)
### How to Use
This section provides examples of how to use the Deepseek Coder model for code completion, code insertion, and repository-level code completion tasks.
#### Code Completion
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True).cuda()
input_text = "#write a quick sort algorithm"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
#### Code Insertion
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True).cuda()
input_text = """<|begin|>def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[0]
left = []
right = []
<|hole|>
if arr[i] < pivot:
left.append(arr[i])
else:
right.append(arr[i])
return quick_sort(left) + [pivot] + quick_sort(right)<|end|>"""
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])
```
#### Repository Level Code Completion
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-coder-5.7bmqa-base", trust_remote_code=True).cuda()
input_text = """#utils.py
import torch
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
def load_data():
iris = datasets.load_iris()
X = iris.data
y = iris.target
# Standardize the data
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Convert numpy data to PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.int64)
y_test = torch.tensor(y_test, dtype=torch.int64)
return X_train, X_test, y_train, y_test
def evaluate_predictions(y_test, y_pred):
return accuracy_score(y_test, y_pred)
#model.py
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
class IrisClassifier(nn.Module):
def __init__(self):
super(IrisClassifier, self).__init__()
self.fc = nn.Sequential(
nn.Linear(4, 16),
nn.ReLU(),
nn.Linear(16, 3)
)
def forward(self, x):
return self.fc(x)
def train_model(self, X_train, y_train, epochs, lr, batch_size):
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(self.parameters(), lr=lr)
# Create DataLoader for batches
dataset = TensorDataset(X_train, y_train)
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
for epoch in range(epochs):
for batch_X, batch_y in dataloader:
optimizer.zero_grad()
outputs = self(batch_X)
loss = criterion(outputs, batch_y)
loss.backward()
optimizer.step()
def predict(self, X_test):
with torch.no_grad():
outputs = self(X_test)
_, predicted = outputs.max(1)
return predicted.numpy()
#main.py
from utils import load_data, evaluate_predictions
from model import IrisClassifier as Classifier
def main():
# Model training and evaluation
"""
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=140)
print(tokenizer.decode(outputs[0]))
```
License
-------
This code repository is licensed under the MIT License. The use of Deepseek Coder models is subject to the Model License. DeepSeek Coder supports commercial use.
See the [LICENSE-MODEL](https://github.com/deepseek-ai/deepseek-coder/blob/main/LICENSE-MODEL) for more details.
Contact
-------
If you have any questions, please raise an issue or contact us at [agi\_code@deepseek.com](mailto:agi_code@deepseek.com).
#### Suggested labels
#### { "key": "llm-experiments", "value": "Experiments and results related to Large Language Models" } { "key": "AI-Chatbots", "value": "Topics related to advanced chatbot platforms integrating multiple AI models" }
309: openai/human-eval: Code for the paper "Evaluating Large Language Models Trained on Code"
### DetailsSimilarity score: 0.88
- [ ] [openai/human-eval: Code for the paper "Evaluating Large Language Models Trained on Code"](https://github.com/openai/human-eval)
HumanEval: Hand-Written Evaluation Set
This is an evaluation harness for the HumanEval problem solving dataset described in the paper "Evaluating Large Language Models Trained on Code".
Installation
Make sure to use python 3.7 or later:
$ conda create -n codex python=3.7
$ conda activate codex
Check out and install this repository:
$ git clone https://github.com/openai/human-eval
$ pip install -e human-eval
Usage
This program exists to run untrusted model-generated code. Users are strongly encouraged not to do so outside of a robust security sandbox. The execution call in execution.py is deliberately commented out to ensure users read this disclaimer before running code in a potentially unsafe manner. See the comment in execution.py for more information and instructions.
After following the above instructions to enable execution, generate samples and save them in the following JSON Lines (jsonl) format, where each sample is formatted into a single line like so:
{"task_id": "Corresponding HumanEval task ID", "completion": "Completion only without the prompt"}
We provide example_problem.jsonl and example_solutions.jsonl under data to illustrate the format and help with debugging.
Here is nearly functional example code (you just have to provide generate_one_completion to make it work) that saves generated completions to samples.jsonl.
from human_eval.data import write_jsonl, read_problems
problems = read_problems()
num_samples_per_task = 200
samples = [
dict(task_id=task_id, completion=generate_one_completion(problems[task_id]["prompt"]))
for task_id in problems
for _ in range(num_samples_per_task)
]
write_jsonl("samples.jsonl", samples)
To evaluate the samples, run
$ evaluate_functional_correctness samples.jsonl
Reading samples...
32800it [00:01, 23787.50it/s]
Running test suites...
100%|...| 32800/32800 [16:11<00:00, 33.76it/s]
Writing results to samples.jsonl_results.jsonl...
100%|...| 32800/32800 [00:00<00:00, 42876.84it/s]
{'pass@1': ..., 'pass@10': ..., 'pass@100': ...}
This script provides more fine-grained information in a new file ending in _results.jsonl. Each row now contains whether the completion passed along with the execution result which is one of "passed", "timed out", or "failed".
As a quick sanity-check, the example samples should yield 0.5 pass@1.
$ evaluate_functional_correctness data/example_samples.jsonl --problem_file=data/example_problem.jsonl
Reading samples...
6it [00:00, 3397.11it/s]
Running example suites...
100%|...| 6/6 [00:03<00:00, 1.96it/s]
Writing results to data/example_samples.jsonl_results.jsonl...
100%|...| 6/6 [00:00<00:00, 6148.50it/s]
{'pass@1': 0.4999999999999999}
Because there is no unbiased way of estimating pass@k when there are fewer samples than k, the script does not evaluate pass@k for these cases. To evaluate with other k values, pass --k=. For other options, see
$ evaluate_functional_correctness --help
However, we recommend that you use the default values for the rest.
Known Issues
While evaluation uses very little memory, you might see the following error message when the system is running out of RAM. Since this may cause some correct programs to fail, we recommend that you free some memory and try again.
malloc: can't allocate region
Citation
Please cite using the following bibtex entry:
@article{chen2021codex,
title={Evaluating Large Language Models Trained on Code},
author={Mark Chen and Jerry Tworek and Heewoo Jun and Qiming Yuan and Henrique Ponde de Oliveira Pinto and Jared Kaplan and Harri Edwards and Yuri Burda and Nicholas Joseph and Greg Brockman and Alex Ray and Raul Puri and Gretchen Krueger and Michael Petrov and Heidy Khlaaf and Girish Sastry and Pamela Mishkin and Brooke Chan and Scott Gray and Nick Ryder and Mikhail Pavlov and Alethea Power and Lukasz Kaiser and Mohammad Bavarian and Clemens Winter and Philippe Tillet and Felipe Petroski Such and Dave Cummings and Matthias Plappert and Fotios Chantzis and Elizabeth Barnes and Ariel Herbert-Voss and William Hebgen Guss and Alex Nichol and Alex Paino and Nikolas Tezak and Jie Tang and Igor Babuschkin and Suchir Balaji and Shantanu Jain and William Saunders and Christopher Hesse and Andrew N. Carr and Jan Leike and Josh Achiam and Vedant Misra and Evan Morikawa and Alec Radford and Matthew Knight and Miles Brundage and Mira Murati and Katie Mayer and Peter Welinder and Bob McGrew and Dario Amodei and Sam McCandlish and Ilya Sutskever and Wojciech Zaremba},
year={2021},
eprint={2107.03374},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
#### Suggested labels
#### { "key": "llm-evaluation", "value": "Evaluating Large Language Models performance and behavior through human-written evaluation sets" }
Example - Qwen
DESCRIPTION:
Here we provide a very simple script for supervised finetuning, which is revised from the training script in
Fastchat
. The script is used to finetune Qwen with Hugging Face Trainer. You can check the script here. This script for supervised finetuning (SFT) has the following features:In the following, we introduce more details about the usage of the script.
Installation
Before you start, make sure you have installed the following packages:
Data Preparation
For data preparation, we advise you to organize the data in a jsonl file, where each line is a dictionary as demonstrated below:
Above are two examples of each data sample in the dataset. Each sample is a JSON object with the following fields: type, messages, and source. messages is required while the others are optional for you to label your data format and data source. The messages field is a list of JSON objects, each of which has two fields: role and content. role can be system, user, or assistant. content is the text of the message. source is the source of the data, which can be self-made, alpaca, open-hermes, or any other string.
To make the jsonl file, you can use json to save a list of dictionaries to the jsonl file:
Quickstart
For you to start finetuning quickly, we directly provide a shell script for you to run without paying attention to details. You need different hyperparameters for different types of training, e.g., single-GPU / multi-GPU training, full-parameter tuning, LoRA, or Q-LoRA.
Specify the
<model_path>
for your model,<data_path>
for your data, and<config_path>
for your deepspeed configuration. If you use LoRA or Q-LoRA, just add--use_lora True
or--q_lora True
based on your requirements. This is the simplest way to start finetuning. If you want to change more hyperparameters, you can dive into the script and modify those parameters.Advanced Usages
In this section, we introduce the details of the scripts, including the core python script as well as the corresponding shell script.
Shell Script
Before we introduce the python code, we provide a brief introduction to the shell script with commands. We provide some guidance inside the shell script and here we take
finetune.sh
as an example.To set up the environment variables for distributed training (or single-GPU training), specify the following variables:
GPUS_PER_NODE
,NNODES
,NODE_RANK
,MASTER_ADDR
, andMASTER_PORT
. No need to worry too much about them as we provide the default settings for you. In the command, you can pass in the argument-m
and-d
to specify the model path and data path, respectively. You can also pass in the argument--deepspeed
to specify the deepspeed configuration file. We provide two configuration files for ZeRO2 and ZeRO3, and you can choose one based on your requirements. In most cases, we recommend using ZeRO3 for multi-GPU training except for Q-LoRA, where we recommend using ZeRO2.There are a series of hyperparameters to tune. Passing in
--bf16
or--fp16
to specify the precision for mixed precision training. The other significant hyperparameters include:--output_dir
: the path of your output models or adapters.--num_train_epochs
: the number of training epochs.--gradient_accumulation_steps
: the number of gradient accumulation steps.--per_device_train_batch_size
: the batch size per GPU for training, and the total batch size is equal toper_device_train_batch_size * number_of_gpus * gradient_accumulation_steps
.--learning_rate
: the learning rate.--warmup_steps
: the number of warmup steps.--lr_scheduler_type
: the type of learning rate scheduler.--weight_decay
: the value of weight decay.--adam_beta2
: the value of in Adam.--model_max_length
: the maximum sequence length.--use_lora
: whether to use LoRA. Adding--q_lora
can enable Q-LoRA.--gradient_checkpointing
: whether to use gradient checkpointing.URL: https://qwen.readthedocs.io/en/latest/training/SFT/example.html
Suggested labels