自定义数据微调MiniCPM-Llama3-V-2_5报错

zhudongwork commented 3 months ago

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图) Train: 0%| | 0/6 [00:00<?, ?it/s]Traceback (most recent call last): File "/node6/docker-envs/zhudong/vlm_work/swift/swift/cli/sft.py", line 5, in sft_main() File "/node6/docker-envs/zhudong/vlm_work/swift/swift/utils/run_utils.py", line 27, in x_main result = llm_x(args, *kwargs) File "/node6/docker-envs/zhudong/vlm_work/swift/swift/llm/sft.py", line 298, in llm_sft trainer.train(training_args.resume_from_checkpoint) File "/node6/docker-envs/zhudong/vlm_work/swift/swift/trainers/trainers.py", line 50, in train res = super().train(args, **kwargs) File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/transformers/trainer.py", line 2165, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/accelerate/data_loader.py", line 464, in iter next_batch = next(dataloader_iter) File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 631, in next data = self._next_data() File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch return self.collate_fn(data) File "/node6/docker-envs/zhudong/vlm_work/swift/swift/llm/utils/template.py", line 408, in data_collator labels = [torch.tensor(b['labels']) for b in batch] File "/node6/docker-envs/zhudong/vlm_work/swift/swift/llm/utils/template.py", line 408, in labels = [torch.tensor(b['labels']) for b in batch] RuntimeError: Could not infer dtype of NoneType Train: 0%| | 0/6 [00:01<?, ?it/s]

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)

Additional context Add any other context about the problem here(在这里补充其他信息)

Jintao-Huang commented 3 months ago

微调的数据集不含response字段嘛

zhudongwork commented 3 months ago

微调的数据集不含response字段嘛

有的，这是数据的样式：

zhudongwork commented 3 months ago

微调Cogvlm2的时候也是一样的问题，是我数据集的格式不对吗。

这是运行命令 CUDA_VISIBLE_DEVICES=7 swift sft --model_id_or_path /node6/models/ZhipuAI/cogvlm2-llama3-chinese-chat-19B --model_type cogvlm2-19b-chat --dataset ../ocr_100.json --batch_size 4 --val_dataset_sample 10

微调数据 ocr_100.json

Jintao-Huang commented 3 months ago

kratorado commented 3 months ago

swift支持的数据集里面，medical-zh 也有这个问题。最新的main分支

Original Traceback (most recent call last):
  File "/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
    return self.collate_fn(data)                                                                                                                                                                                                              File "/work/swift-unsloth/swift/swift/llm/utils/template.py", line 453, in data_collator                                                                                                                              labels = [torch.tensor(b['labels']) for b in batch]
  File "/work/swift-unsloth/swift/swift/llm/utils/template.py", line 453, in <listcomp>
    labels = [torch.tensor(b['labels']) for b in batch]
RuntimeError: Could not infer dtype of NoneType

modelscope / ms-swift

自定义数据微调MiniCPM-Llama3-V-2_5报错 #1030