Closed zhudongwork closed 3 months ago
微调的数据集不含response字段嘛
微调的数据集不含response字段嘛
有的,这是数据的样式:
微调Cogvlm2的时候也是一样的问题,是我数据集的格式不对吗。
这是运行命令
CUDA_VISIBLE_DEVICES=7 swift sft --model_id_or_path /node6/models/ZhipuAI/cogvlm2-llama3-chinese-chat-19B --model_type cogvlm2-19b-chat --dataset ../ocr_100.json --batch_size 4 --val_dataset_sample 10
微调数据 ocr_100.json
swift支持的数据集里面,medical-zh 也有这个问题。最新的main分支
Original Traceback (most recent call last):
File "/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data) File "/work/swift-unsloth/swift/swift/llm/utils/template.py", line 453, in data_collator labels = [torch.tensor(b['labels']) for b in batch]
File "/work/swift-unsloth/swift/swift/llm/utils/template.py", line 453, in <listcomp>
labels = [torch.tensor(b['labels']) for b in batch]
RuntimeError: Could not infer dtype of NoneType
Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图) Train: 0%| | 0/6 [00:00<?, ?it/s]Traceback (most recent call last): File "/node6/docker-envs/zhudong/vlm_work/swift/swift/cli/sft.py", line 5, in
sft_main()
File "/node6/docker-envs/zhudong/vlm_work/swift/swift/utils/run_utils.py", line 27, in x_main
result = llm_x(args, *kwargs)
File "/node6/docker-envs/zhudong/vlm_work/swift/swift/llm/sft.py", line 298, in llm_sft
trainer.train(training_args.resume_from_checkpoint)
File "/node6/docker-envs/zhudong/vlm_work/swift/swift/trainers/trainers.py", line 50, in train
res = super().train(args, **kwargs)
File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/transformers/trainer.py", line 2165, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/accelerate/data_loader.py", line 464, in iter
next_batch = next(dataloader_iter)
File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 631, in next
data = self._next_data()
File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/mnt/anaconda3/envs/modelscope/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
return self.collate_fn(data)
File "/node6/docker-envs/zhudong/vlm_work/swift/swift/llm/utils/template.py", line 408, in data_collator
labels = [torch.tensor(b['labels']) for b in batch]
File "/node6/docker-envs/zhudong/vlm_work/swift/swift/llm/utils/template.py", line 408, in
labels = [torch.tensor(b['labels']) for b in batch]
RuntimeError: Could not infer dtype of NoneType
Train: 0%| | 0/6 [00:01<?, ?it/s]
Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
Additional context Add any other context about the problem here(在这里补充其他信息)