sft do_predict, 生成的json 文件的 label 都是空

Reminder

[X] I have read the README and searched the existing issues.

System Info

- `llamafactory` version: 0.9.1.dev0
- Platform: Linux-5.19.0-0_fbk12_zion_11583_g0bef9520ca2b-x86_64-with-glibc2.34
- Python version: 3.12.5
- PyTorch version: 2.4.1+cu121 (GPU)
- Transformers version: 4.44.2
- Datasets version: 2.21.0
- Accelerate version: 0.34.2
- PEFT version: 0.12.0
- TRL version: 0.9.6
- GPU type: NVIDIA H100

Reproduction

看了很久都没想明白，希望得到解答，谢谢！

我用的custom dataset 去做inference

在dataset_info.json中是这样定义的

"my_dataset":{
  "file_name": "/home/my_dataset.json",
  "columns": {
    "prompt": "input",
    "response": "output",
    "system": "instruction"
  }
}

我的data 是一个.json file，格式和 https://github.com/hiyouga/LLaMA-Factory/blob/1a3e6545b2e1d2dab01d2a257130a47da62e747a/data/alpaca_en_demo.json 这个文件一摸一样，每个json object 都有 instruction, input, output 三项。

my_data_inference.yaml 文件的定义：

stage: sft
do_predict: true
finetuning_type: lora

### dataset
eval_dataset: my_dataset
template: llama3
cutoff_len: 4096
overwrite_cache: true
preprocessing_num_workers: 16
max_samples: 10

### output
output_dir: inference_outputs/mydata/
overwrite_output_dir: true

### generating
temperature: 0.1
max_new_tokens: 10

### eval
per_device_eval_batch_size: 6
predict_with_generate: true

然后run的话是

llamafactory-cli train examples/train_lora/my_data_inference.yaml

最后在inference_outputs/mydata/目录下会得到一个 generated_predictions.jsonl 文件，文件里面每个json object有三项: prompt, label, predict. prompt 和 predict都是对的，但是 label永远是空的。我希望label 是原数据 /home/my_dataset.json 里的 output。这个怎么弄？

Expected behavior

希望 label是原数据 /home/my_dataset.json 里的 output

Others

No response

hiyouga / LLaMA-Factory

sft do_predict, 生成的json 文件的 label 都是空 #5465

Reminder

System Info

Reproduction

Expected behavior

Others

hiyouga / LLaMA-Factory

sft do_predict, 生成的json 文件 的 label 都是空 #5465

Reminder

System Info

Reproduction

Expected behavior

Others

sft do_predict, 生成的json 文件的 label 都是空 #5465