╰─ openai tools fine_tunes.prepare_data -f FAQ.jsonl ─╯
Analyzing...
- Your file contains 138 prompt-completion pairs
- More than a third of your `prompt` column/key is uppercase. Uppercase prompts tends to perform worse than a mixture of case encountered in normal language. We recommend to lower case the data if that makes sense in your domain. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more details
- More than a third of your `completion` column/key is uppercase. Uppercase completions tends to perform worse than a mixture of case encountered in normal language. We recommend to lower case the data if that makes sense in your domain. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more details
- Your data does not contain a common separator at the end of your prompts. Having a separator string appended to the end of the prompt makes it clearer to the fine-tuned model where the completion should begin. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more detail and examples. If you intend to do open-ended generation, then you should leave the prompts empty
- Your data does not contain a common ending at the end of your completions. Having a common ending string appended to the end of the completion makes it clearer to the fine-tuned model where the completion should end. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more detail and examples.
- The completion should start with a whitespace character (` `). This tends to produce better results due to the tokenization we use. See https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset for more details
Based on the analysis we will perform the following actions:
- [Recommended] Lowercase all your data in column/key `prompt` [Y/n]: y
- [Recommended] Lowercase all your data in column/key `completion` [Y/n]: y
- [Recommended] Add a suffix separator ` ->` to all prompts [Y/n]: y
- [Recommended] Add a suffix ending `\n` to all completions [Y/n]: y
- [Recommended] Add a whitespace character to the beginning of the completion [Y/n]: y
Your data will be written to a new JSONL file. Proceed [Y/n]: y
Wrote modified file to `FAQ_prepared.jsonl`
Feel free to take a look!
Now use that file when fine-tuning:
> openai api fine_tunes.create -t "FAQ_prepared.jsonl"
After you’ve fine-tuned a model, remember that your prompt has to end with the indicator string ` ->` for the model to start generating completions, rather than continuing with the prompt. Make sure to include `stop=["\n"]` so that the generated texts ends at the expected place.
Once your model starts training, it'll approximately take 4.34 minutes to train a `curie` model, and less for `ada` and `babbage`. Queue will approximately take half an hour per job ahead of you.
╰─ openai api fine_tunes.create -t FAQ_prepared.jsonl -m davinci --suffix "faq" ─╯
Found potentially duplicated files with name 'FAQ_prepared.jsonl', purpose 'fine-tune' and size 18176 bytes
file-CZwRutKW7BAX1uqn2sWfUU0K
file-FtggkvHGwhMw5sTFzbsHqgne
file-frKnGrFLhW300pIk6G9jTwiY
Enter file ID to reuse an already uploaded file, or an empty string to upload this file anyway:
Upload progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 18.2k/18.2k [00:00<00:00, 6.84Mit/s]
Uploaded file from FAQ_prepared.jsonl: file-l9PPcrSY7Q497AYLr86vunOU
Created fine-tune: ft-EMGUVyjdwCMcUV1JNyreGW1j
Streaming events until fine-tuning is complete...
(Ctrl-C will interrupt the stream, but not cancel the fine-tune)
[2023-03-16 17:06:12] Created fine-tune: ft-EMGUVyjdwCMcUV1JNyreGW1j
Stream interrupted (client disconnected).
To resume the stream, run:
openai api fine_tunes.follow -i ft-EMGUVyjdwCMcUV1JNyreGW1j
╰─ openai api fine_tunes.follow -i ft-EMGUVyjdwCMcUV1JNyreGW1j ─╯
[2023-03-16 17:06:12] Created fine-tune: ft-EMGUVyjdwCMcUV1JNyreGW1j
[2023-03-16 17:10:47] Fine-tune costs $1.23
[2023-03-16 17:10:48] Fine-tune enqueued. Queue number: 25
[2023-03-16 17:11:55] Fine-tune is in the queue. Queue number: 24
[2023-03-16 17:11:59] Fine-tune is in the queue. Queue number: 23
[2023-03-16 17:13:13] Fine-tune is in the queue. Queue number: 22
Stream interrupted (client disconnected).
To resume the stream, run:
openai api fine_tunes.follow -i ft-EMGUVyjdwCMcUV1JNyreGW1j
训练已完成
╰─ openai api fine_tunes.follow -i ft-EMGUVyjdwCMcUV1JNyreGW1j ─╯
[2023-03-16 17:06:12] Created fine-tune: ft-EMGUVyjdwCMcUV1JNyreGW1j
[2023-03-16 17:10:47] Fine-tune costs $1.23
[2023-03-16 17:10:48] Fine-tune enqueued. Queue number: 25
[2023-03-16 17:11:55] Fine-tune is in the queue. Queue number: 24
[2023-03-16 17:11:59] Fine-tune is in the queue. Queue number: 23
[2023-03-16 17:13:13] Fine-tune is in the queue. Queue number: 22
[2023-03-16 17:14:50] Fine-tune is in the queue. Queue number: 21
[2023-03-16 17:15:20] Fine-tune is in the queue. Queue number: 20
[2023-03-16 17:16:02] Fine-tune is in the queue. Queue number: 19
[2023-03-16 17:17:43] Fine-tune is in the queue. Queue number: 18
[2023-03-16 17:18:09] Fine-tune is in the queue. Queue number: 17
[2023-03-16 17:18:59] Fine-tune is in the queue. Queue number: 16
[2023-03-16 17:19:41] Fine-tune is in the queue. Queue number: 15
[2023-03-16 17:20:41] Fine-tune is in the queue. Queue number: 14
[2023-03-16 17:21:07] Fine-tune is in the queue. Queue number: 13
[2023-03-16 17:22:59] Fine-tune is in the queue. Queue number: 12
[2023-03-16 17:24:22] Fine-tune is in the queue. Queue number: 11
[2023-03-16 17:24:27] Fine-tune is in the queue. Queue number: 10
[2023-03-16 17:24:30] Fine-tune is in the queue. Queue number: 9
[2023-03-16 17:24:42] Fine-tune is in the queue. Queue number: 8
[2023-03-16 17:25:16] Fine-tune is in the queue. Queue number: 7
[2023-03-16 17:26:18] Fine-tune is in the queue. Queue number: 6
[2023-03-16 17:28:01] Fine-tune is in the queue. Queue number: 5
[2023-03-16 17:28:14] Fine-tune is in the queue. Queue number: 4
[2023-03-16 17:29:53] Fine-tune is in the queue. Queue number: 3
[2023-03-16 17:31:18] Fine-tune is in the queue. Queue number: 2
[2023-03-16 17:33:41] Fine-tune is in the queue. Queue number: 1
[2023-03-16 17:36:28] Fine-tune is in the queue. Queue number: 0
[2023-03-16 17:36:32] Fine-tune started
[2023-03-16 17:39:16] Completed epoch 1/4
[2023-03-16 17:39:59] Completed epoch 2/4
[2023-03-16 17:40:43] Completed epoch 3/4
[2023-03-16 17:41:26] Completed epoch 4/4
[2023-03-16 17:41:59] Uploaded model: davinci:ft-personal:faq-2023-03-16-09-41-59
[2023-03-16 17:42:00] Uploaded result file: file-TjB8Xgm0mj67bqNZkcNIWR9b
[2023-03-16 17:42:00] Fine-tune succeeded
Job complete! Status: succeeded 🎉
Try out your fine-tuned model:
openai api completions.create -m davinci:ft-personal:faq-2023-03-16-09-41-59 -p <YOUR_PROMPT>
其他微调模型操作
# 跟踪微调模型 job 状态
openai api fine_tunes.follow -i <YOUR_FINE_TUNE_JOB_ID>
# 列出所有微调模型
openai api fine_tunes.list
# 获取微调模型 job 信息
openai api fine_tunes.get -i <YOUR_FINE_TUNE_JOB_ID>
# 获取微调模型 job
openai api fine_tunes.cancel -i <YOUR_FINE_TUNE_JOB_ID>
使用 OpenAI Finetune API 创建自己的微调模型
安装 cli 客户端
准备训练数据
JSONL 格式数据
验证、建议、重新格式化数据
格式化后数据
创建微调模型(ada、babbage、curie、davinci)
训练未完成
训练已完成
其他微调模型操作
使用微调模型
相关连接
原文链接