modelscope / ms-swift

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
4.23k stars 373 forks source link

源码安装swift报错 #845

Closed zhangfan-algo closed 6 months ago

zhangfan-algo commented 6 months ago

image 硬件配置:4台8*a800

zhangfan-algo commented 6 months ago

怀疑是多线程冲突引起的

Jintao-Huang commented 6 months ago

卸载干净 再装一下呗

zhangfan-algo commented 6 months ago

Traceback (most recent call last): File "/mnt/pfs/zhangfan/homework_correction/swift_0429/examples/pytorch/llm/llm_sft.py", line 7, in output = sft_main() File "/mnt/pfs/zhangfan/homework_correction/swift_0429/swift/utils/run_utils.py", line 31, in x_main result = llm_x(args, *kwargs) File "/mnt/pfs/zhangfan/homework_correction/swift_0429/swift/llm/sft.py", line 265, in llm_sft trainer.train(training_args.resume_from_checkpoint) File "/mnt/pfs/zhangfan/homework_correction/swift_0429/swift/trainers/trainers.py", line 54, in train res = super().train(args, **kwargs) File "/apps1/zhangfan/anaconda3/envs/swift/lib/python3.10/site-packages/transformers/trainer.py", line 1624, in train return inner_training_loop( File "/apps1/zhangfan/anaconda3/envs/swift/lib/python3.10/site-packages/transformers/trainer.py", line 1928, in _inner_training_loop for step, inputs in enumerate(epoch_iterator): File "/apps1/zhangfan/anaconda3/envs/swift/lib/python3.10/site-packages/accelerate/data_loader.py", line 452, in iter current_batch = next(dataloader_iter) File "/apps1/zhangfan/anaconda3/envs/swift/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 631, in next data = self._next_data() File "/apps1/zhangfan/anaconda3/envs/swift/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 675, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/apps1/zhangfan/anaconda3/envs/swift/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch return self.collate_fn(data) File "/mnt/pfs/zhangfan/homework_correction/swift_0429/swift/llm/utils/template.py", line 1050, in data_collator res = super().data_collator(batch, padding_to) File "/mnt/pfs/zhangfan/homework_correction/swift_0429/swift/llm/utils/template.py", line 436, in data_collator input_ids = [torch.tensor(b['input_ids']) for b in batch] File "/mnt/pfs/zhangfan/homework_correction/swift_0429/swift/llm/utils/template.py", line 436, in input_ids = [torch.tensor(b['input_ids']) for b in batch] KeyError: 'input_ids' 目前先在单台机器上试跑 有这个报错

zhangfan-algo commented 6 months ago

数据集格式: {"query":"这是学生书写的数字和数学公式相关内容。请你准确说出图片中手写体内容是什么.数学公式用latex表达。你的输出格式必须是:图片中手写体内容是:XXX.let us think step by step","response":"图片中手写体内容是:(数学公式用latex公式表达)\n\n"+str(label),"images":[file_path]}

Jintao-Huang commented 6 months ago

什么模型呀

zhangfan-algo commented 6 months ago

internvl-chat-v1_5

hjh0119 commented 6 months ago

方便提供下sft命令和数据样例?

hjh0119 commented 6 months ago

--max_length 1024 太小了,图像部分vit的embeds长度一般都超过1024了。

hjh0119 commented 6 months ago

建议设到2048以上

zhangfan-algo commented 6 months ago

好的 我试试

zhangfan-algo commented 6 months ago

大佬 可以了 还想问下 我们目前支持epoch的方式保存模型不

hjh0119 commented 6 months ago

大佬 可以了 还想问下 我们目前支持epoch的方式保存模型不

@Jintao-Huang

tastelikefeet commented 6 months ago

目前不支持,仅支持step方式,可以考虑将--save_steps设置为和一个epoch相匹配

zhangfan-algo commented 6 months ago

卸载干净 再装一下呗 大佬 虚拟环境之前没有装swift 重新跑了一下 还是报错了