模型训练执行问题

shuxueslpi / chatGLM-6B-QLoRA

使用peft库，对chatGLM-6B/chatGLM2-6B实现4bit的QLoRA高效微调，并做lora model和base model的merge及4bit的量化（quantize）。

350 stars 46 forks source link

模型训练执行问题 #9

Closed steamfeifei closed 1 year ago

steamfeifei commented 1 year ago

这是什么呀，怎么选择呢？

steamfeifei commented 1 year ago

已解决：在训练文件开头配置环境变量： os.environ["WANDB_DISABLED"] = "true"

steamfeifei commented 1 year ago

解决RuntimeError: Expected all tensors to be on the same device, but found at least two devices 主要是因为配置了多个gpu，代码中数据一会在cpu，一会在gpu运行，最好是放到同一个device中执行即可。所以解决方案是，只使用一个gpu即可解决： import os os.environ["CUDA_VISIBLE_DEVICES"] = "0"

refer @Aicharm

Mou-Mou-L commented 1 year ago

解决RuntimeError: Expected all tensors to be on the same device, but found at least two devices 主要是因为配置了多个gpu，代码中数据一会在cpu，一会在gpu运行，最好是放到同一个device中执行即可。所以解决方案是，只使用一个gpu即可解决： import os os.environ["CUDA_VISIBLE_DEVICES"] = "0"

refer @Aicharm

大佬也是在docker环境运行的么？我没docker环境能不能通过其他方式运行

steamfeifei commented 1 year ago

我也没用docker环境呀，直接在linux后台跑的呢

Mou-Mou-L commented 1 year ago

我的是windows系统之前用ptuning 微调我就是用git bash 运行的脚本可以微调现在不行了

steamfeifei commented 1 year ago

我的是windows系统之前用ptuning 微调我就是用git bash 运行的脚本可以微调现在不行了

是报了什么错呀

steamfeifei commented 1 year ago

已经跑了一天了，看看效果如何：

shuxueslpi commented 1 year ago

@Mou-Mou-L windows可以安装linux子系统的，然后再安装docker desktop，就可以丝滑的用docker了，可以参考docker desktop的安装文档来

Mou-Mou-L commented 1 year ago

我的是windows系统之前用ptuning 微调我就是用git bash 运行的脚本可以微调现在不行了

是报了什么错呀

https://github.com/shuxueslpi/chatGLM-6B-QLoRA/issues/12#issuecomment-1614418173这个报错大佬

Mou-Mou-L commented 1 year ago

@Mou-Mou-L windows可以安装linux子系统的，然后再安装docker desktop，就可以丝滑的用docker了，可以参考docker desktop的安装文档来

公司的电脑装不了打开系统虚拟机系统就崩了之前试过哈哈

2512309z commented 1 year ago

Solved: Configure environment variables at the beginning of the training file: os.environ["WANDB_DISABLED"] = "true" 已解决：在训练文件开头配置环境变量： os.environ["WANDB_DISABLED"] = "true"

请问一下你说的训练文件是指的哪个文件？

steamfeifei commented 1 year ago

Solved: Configure environment variables at the beginning of the training file: os.environ["WANDB_DISABLED"] = "true" 已解决：在训练文件开头配置环境变量： os.environ["WANDB_DISABLED"] = "true"

请问一下你说的训练文件是指的哪个文件？

train_qlora.py 啊

2512309z commented 1 year ago

Solved: Configure environment variables at the beginning of the training file: os.environ["WANDB_DISABLED"] = "true" 已解决：在训练文件开头配置环境变量： os.environ["WANDB_DISABLED"] = "true"

请问一下你说的训练文件是指的哪个文件？

train_qlora.py 啊

非常感谢！搞定了！也看到你另外一个回答。多显卡的报错的异常，也搞定了！